XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    nVidia Tesla P4 for vgpu and Plex encoding

    Scheduled Pinned Locked Moved Solved Compute
    vgpu
    63 Posts 14 Posters 18.3k Views 16 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • okynnorO Offline
      okynnor
      last edited by okynnor

      Does anyone have experience with nVidia Tesla P4 with XCP-ng with XO and Plex Media Server running in a VM with transcoding enabled.

      1 Reply Last reply Reply Quote 0
      • M Offline
        mgformula @high-voltages
        last edited by mgformula

        @high-voltages thanks so much!! I'm also testing with M10 cards, we are trying to find VMware alternatives and are a large vGPU shop πŸ™‚

        here is what i'm currently running
        xe host-param-get param-name=software-version uuid=$(xe host-list --minimal)
        product_version: 8.2.1; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.1; product_brand: XCP-ng; build_number: release/yangtze/master/58; hostname: localhost; date: 2024-07-17; dbv: 0.0.1; xapi: 1.20; xen: 4.13.5-9.44; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.603

        i installed the nvidia host driver using rpm -iv "nvidia.rpm ive also tried other methods copying the .iso and installing the supplemental pack. One thing to mention is i grabbed the binaries from XenServer8_2024-10-03
        14488ebb-1fe5-4553-a3e2-7c6a0decd8a6-image.png

        d65c947e-addb-4a07-bbae-548f236012de-image.png

        H 1 Reply Last reply Reply Quote 0
        • splastunovS Offline
          splastunov
          last edited by

          Hello!

          I have found information that under proxmox it is possible to run NVIDA vGPU.
          Here are some links.
          Maybe it is possible to adopt it to xcp-ng?

          https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE_7.x
          https://github.com/DualCoder/vgpu_unlock
          https://gitlab.com/polloloco/vgpu-proxmox

          I have server with NVIDIA Tesla T4 for test.
          I will try to make it work and report later.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates πŸͺ Co-Founder CEO
            last edited by

            Proxmox and XCP-ng are 2 very different platforms.

            But yes, in XCP-ng, you can PCI passthrouh the card to the VM.

            1 Reply Last reply Reply Quote 0
            • splastunovS Offline
              splastunov
              last edited by

              The PCI passthrough is not interesting for the GPU.

              xcp-ng currently only has one compatible AMD Firepro s7150X2 GPU.
              This is not enough for modern tasks

              1 Reply Last reply Reply Quote 0
              • splastunovS Offline
                splastunov
                last edited by

                I have tried to install official vGPU drivers from NVIDIA for xenserver 8.2 and now I can see list of vGPUs in xencenter
                8aa8d620-2584-40e5-bb92-63b533072a31-image.png

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates πŸͺ Co-Founder CEO
                  last edited by

                  So are you talking about vGPU? If yes, it wasn't clear.

                  1 Reply Last reply Reply Quote 0
                  • splastunovS Offline
                    splastunov
                    last edited by

                    Yes, I'm trying to build server with xcp-ng and some "modern" GPU. I need VMs with GPU (vGPU or MxGPU).
                    It is impossible to have 20-30 "hardware" GPUs in one server to passthrough them to VMs.

                    So, I have installed vGPU driver from NVIDIA, but VMs not starting with error "An emulator required to run this VM failed to start".
                    Is it because there is some proprietary piece of code in XenServer?

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates πŸͺ Co-Founder CEO
                      last edited by

                      Yes, some parts we can't redistribute publicly, but which are present in CH ISO.

                      1 Reply Last reply Reply Quote 0
                      • splastunovS Offline
                        splastunov
                        last edited by

                        I've been digging around vGPU for days.
                        I found information that you don't need any additional binaries to make it work. All you need is Nvidia vGPU drivers.
                        So I tried installing different versions of Nvidia vGPU drivers for XenServer on xcp-ng.
                        There were no errors during installation.
                        After installing the drivers I was able to see all vGPU types in XenCenter, nvidia-smi gives me the correct output.
                        I also checked xensource.log and this is what I found.

                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:7e239728-ac74-4323-b8d0-7c40fa318411
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:6448a86a-ee72-492a-b700-83d1645f0c60
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:7e239728-ac74-4323-b8d0-7c40fa318411
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:6448a86a-ee72-492a-b700-83d1645f0c60
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|vgpuops] vGPUs allocated to VM (OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee) are: OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|vgpuops] Creating virtual VGPUs
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:7e239728-ac74-4323-b8d0-7c40fa318411
                        Dec  1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:6448a86a-ee72-492a-b700-83d1645f0c60
                        Dec  1 00:53:25 XEN60 xapi: [ info||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xenops] xenops: VM.import_metadata {"vusbs":[],"vgpus":[{"implementation":["Nvidia",{"extra_args":"","uuid":"a50159e3-a755-0b90-19ea-d5697b007834","type_id":"224","virtual_pci_address":{"fn":0,"dev":11,"bus":0,"domain":0}}],"physical_pci_address":{"fn":0,"dev":0,"bus":193,"domain":0},"position":0,"id":["f565e3ab-2fc9-2d00-a184-e7f28ee91915","0"]}],"pcis":[],"vifs":[],"vbds":[{"persistent":true,"extra_private_keys":{},"extra_backend_keys":{"polling-duration":"1000","polling-idle-threshold":"50"},"unpluggable":true,"ty":"CDROM","mode":"ReadOnly","position":["Ide",3,0],"id":["f565e3ab-2fc9-2d00-a184-e7f28ee91915","xvdd"]},{"persistent":true,"extra_private_keys":{},"extra_backend_keys":{"polling-duration":"1000","polling-idle-threshold":"50"},"unpluggable":true,"ty":"Disk","backend":["VDI","b2d22c67-abe8-d411-ba19-b5aa046407e9/b3fc6e5b-7d44-4117-b614-a9c086be0cf7"],"mode":"ReadWrite","position":["Ide",0,0],"id":["f565e3ab-2fc9-2d00-a184-e7f28ee91915","xvda"]}],"vm":{"generation_id":"4573398145280631014:2680438277400141975","has_vendor_device":true,"pci_power_mgmt":false,"pci_msitranslate":false,"on_reboot":["Start"],"on_shutdown":["Shutdown"],"on_crash":["Start"],"scheduler_params":{"affinity":[],"priority":[256,0]},"vcpus":2,"vcpu_max":2,"memory_dynamic_min":2147483648,"memory_dynamic_max":2147483648,"memory_static_max":2147483648,"suppress_spurious_page_faults":false,"ty":["HVM",{"firmware":["Uefi",{"backend":"xapidb","on_boot":"Persist"}],"qemu_stubdom":false,"qemu_disk_cmdline":false,"boot_order":"cd","pci_passthrough":false,"pci_emulations":[],"serial":"pty","acpi":true,"video":"Vgpu","video_mib":16,"timeoffset":"0","shadow_multiplier":1.0,"hap":true}],"bios_strings":{"bios-vendor":"Xen","bios-version":"","system-manufacturer":"Xen","system-product-name":"HVM domU","system-version":"","system-serial-number":"","baseboard-manufacturer":"","baseboard-product-name":"","baseboard-version":"","baseboard-serial-number":"","baseboard-asset-tag":"","baseboard-location-in-chassis":"","enclosure-asset-tag":"","hp-rombios":"","oem-1":"Xen","oem-2":"MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d"},"platformdata":{"featureset":"178bfbff-f6d83203-2e500800-040001f7-0000000f-219c01a9-00400004-00000000-010cd005-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000","timeoffset":"0","usb":"true","usb_tablet":"true","device-model":"qemu-upstream-uefi","videoram":"8","hpet":"true","secureboot":"false","viridian_apic_assist":"true","apic":"true","device_id":"0002","cores-per-socket":"2","viridian_crash_ctl":"true","pae":"true","vga":"std","nx":"true","viridian_time_ref_count":"true","viridian_stimer":"true","viridian":"true","acpi":"1","viridian_reference_tsc":"true"},"xsdata":{"vm-data/mmio-hole-size":"268435456","vm-data":""},"ssidref":0,"name":"Windows Server 2022 (64-bit) (1)","id":"f565e3ab-2fc9-2d00-a184-e7f28ee91915"}}
                        Dec  1 00:53:25 XEN60 xenopsd-xc: [debug||140 |Async.VM.start R:021195d15a31|xenops_utils] TypedTable: Writing VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0
                        Dec  1 00:53:25 XEN60 xenopsd-xc: [debug||36 |Async.VM.start R:021195d15a31|xenguesthelper] connect: args = [ -mode hvm_build -image /usr/libexec/xen/boot/hvmloader -vgpu -domid 2 -store_port 3 -store_domid 0 -console_port 4 -console_domid 0 -mem_max_mib 2032 -mem_start_mib 2032 ]
                        Dec  1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Device.Dm.start domid=2 args: [-vgpu -videoram 16 -vnc unix:/var/run/xen/vnc-2,lock-key-sync=off -acpi -monitor null -pidfile /var/run/xen/qemu-dm-2.pid -xen-domid 2 -m size=2032 -boot order=cd -usb -device usb-tablet,port=2 -smp 2,maxcpus=2 -serial pty -display none -nodefaults -trace enable=xen_platform_log -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -S -parallel null -qmp unix:/var/run/xen/qmp-libxl-2,server,nowait -qmp unix:/var/run/xen/qmp-event-2,server,nowait -device xen-platform,addr=3,device-id=0x0002 -drive file=,if=none,id=ide1-cd1,read-only=on -device ide-cd,drive=ide1-cd1,bus=ide.1,unit=1 -device nvme,serial=nvme0,id=nvme0,addr=7 -drive id=disk0,if=none,file=/dev/sm/backend/b2d22c67-abe8-d411-ba19-b5aa046407e9/b3fc6e5b-7d44-4117-b614-a9c086be0cf7,media=disk,auto-read-only=off,format=raw -device nvme-ns,drive=disk0,bus=nvme0,nsid=1 -device xen-pvdevice,device-id=0xc000,addr=6 -net none]
                        Dec  1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Starting daemon: /usr/bin/vgpu with args [--domain=2; --vcpus=2; --suspend=/var/lib/xen/demu-save.2; --device=0000:c1:00.0,224,0000:00:0b.0,a50159e3-a755-0b90-19ea-d5697b007834]
                        Dec  1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] vgpu: should be running in the background (stdout -> syslog); (fd,pid) = (FEFork (35,8823))
                        Dec  1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Daemon started: vgpu-2
                        Dec  1 00:53:56 XEN60 xenopsd-xc: [error||36 ||xenops] vgpu: unexpected exit with code: 127
                        Dec  1 00:53:56 XEN60 xenopsd-xc: [ info||36 ||xenops_server] Caught Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]]) executing ["VM_start",["f565e3ab-2fc9-2d00-a184-e7f28ee91915",false]]: triggering cleanup actions
                        Dec  1 00:53:58 XEN60 xenopsd-xc: [error||36 ||task_server] Task 42 failed; Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]])
                        Dec  1 00:53:58 XEN60 xenopsd-xc: [debug||36 ||xenops_server] TASK.signal 42 = ["Failed",["Failed_to_start_emulator",["f565e3ab-2fc9-2d00-a184-e7f28ee91915","vgpu","Daemon exited unexpectedly"]]]
                        Dec  1 00:53:58 XEN60 xapi: [ info||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_network] Caught Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]]): detaching networks
                        Dec  1 00:53:58 XEN60 xapi: [error||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xenops] Caught exception starting VM: Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]])
                        Dec  1 00:53:58 XEN60 xenopsd-xc: [debug||25 |org.xen.xapi.xenops.classic events D:22e2807de46a|xenops_utils] TypedTable: Removing VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0
                        Dec  1 00:53:58 XEN60 xenopsd-xc: [debug||25 |org.xen.xapi.xenops.classic events D:22e2807de46a|xenops_utils] TypedTable: Deleting VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0
                        Dec  1 00:53:58 XEN60 xenopsd-xc: [debug||25 |org.xen.xapi.xenops.classic events D:22e2807de46a|xenops_utils] DB.delete /var/run/nonpersistent/xenopsd/classic/VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0
                        Dec  1 00:53:58 XEN60 xapi: [error||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xenops] Re-raising as FAILED_TO_START_EMULATOR [ OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee; vgpu; Daemon exited unexpectedly ]
                        Dec  1 00:53:58 XEN60 xapi: [error||952 ||backtrace] Async.VM.start R:021195d15a31 failed with exception Server_error(FAILED_TO_START_EMULATOR, [ OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee; vgpu; Daemon exited unexpectedly ])
                        Dec  1 00:53:58 XEN60 xapi: [error||952 ||backtrace] Raised Server_error(FAILED_TO_START_EMULATOR, [ OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee; vgpu; Daemon exited unexpectedly ])
                        

                        I think that main is

                        Dec  1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] vgpu: should be running in the background (stdout -> syslog); (fd,pid) = (FEFork (35,8823))
                        Dec  1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Daemon started: vgpu-2
                        Dec  1 00:53:56 XEN60 xenopsd-xc: [error||36 ||xenops] vgpu: unexpected exit with code: 127
                        

                        So it said that Daemon started: vgpu-2, but then failed with code: 127

                        Are there any options to debug it? Any ideas why it failed to start?

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates πŸͺ Co-Founder CEO
                          last edited by olivierlambert

                          That's because our version of emu-manager doesn't support vGPU. In short, Citrix decided in 2018 to introduce emu-manager and to make it closed source. Without it, you can't even boot a VM (read the story here, it's pretty "funny": https://bugs.xenserver.org/browse/XSO-878)

                          So we had to come with our own version without any clue on how it works. We managed to make something working, but obviously it took a lot of efforts and time, even without vGPU management.

                          In theory, you can use the emu-manager from Citrix ISO to replace our own, and that should do the trick.

                          1 Reply Last reply Reply Quote 0
                          • splastunovS Offline
                            splastunov
                            last edited by

                            I have tried to com emu-manager from different version of Citrix ISO, but nothing changed.
                            May I have to enable trial license, but it will be non production variant.

                            So I will switch to AMD Firepro s7150X2 GPU

                            It would be great to expand the list of supported GPUs in the future.
                            NVIDIa drivers are available after registration and there are some triks to overcome NVIDIA vGPU license on opensource platforms. So we need "just" make some changes in emu-manager... Do you have source code if emu-manager that xpn-ng using?

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates πŸͺ Co-Founder CEO
                              last edited by

                              We had someone who managed to get Nvidia vGPU working recently, so it should work but I'm not confident to give all details publicly since it's not legal to redistribute or use proprietary packages πŸ˜•

                              In my opinion, the future will be mediated devices, using VFIO or something. And good news: for our DPU work, we are working on an equivalent of VFIO for Xen. So the solution might come from there πŸ™‚

                              splastunovS 1 Reply Last reply Reply Quote 0
                              • splastunovS Offline
                                splastunov @olivierlambert
                                last edited by

                                @olivierlambert could you give me a contact, please? I will contact him in private conversation.

                                1 Reply Last reply Reply Quote 0
                                • splastunovS Offline
                                  splastunov
                                  last edited by splastunov

                                  Finally after a week I found the solution!
                                  There is no problem with emu-manager.
                                  XCP does not contain necessary package vgpu.
                                  I copied vgpu from Citrix ISO and now it is alive! : )

                                  1 Reply Last reply Reply Quote 1
                                  • olivierlambertO Offline
                                    olivierlambert Vates πŸͺ Co-Founder CEO
                                    last edited by

                                    Ah great πŸ™‚ But I think our EMU manager won't work, do you confirm you are still using the Citrix one, right?

                                    1 Reply Last reply Reply Quote 0
                                    • splastunovS Offline
                                      splastunov
                                      last edited by splastunov

                                      Steps I have done to make NVIDIA vGPU works:

                                      1. Install XCP-ng 8.2.1
                                      2. Install all update yum update
                                      3. reboot
                                      4. Download NVIDIA vGPU drivers for XenServer 8.2 from NVIDIA site. Version NVIDIA-GRID-CitrixHypervisor-8.2-510.108.03-513.91
                                      5. Unzip and install rpm from Host-Drivers
                                      6. reboot again
                                      7. Download free CitrixHypervisor-8.2.0-install-cd.iso from Citrix site
                                      8. Open CitrixHypervisor-8.2.0-install-cd.iso with 7-zip, then unzip vgpu binary file from Packages->vgpu....rpm->vgpu....cpio->.->usr->lib64->xen->bin
                                      9. Upload vgpu to XCP-ng host to /usr/lib64/xen/bin and made it executable chmod +x /usr/lib64/xen/bin/vgpu
                                      10. Deployed VM with vGPU and it started without any problems

                                      So I did not make any modifications with emu-manager.

                                      My test server is far away from me and it will take some time to download the windows ISO to this test location. Then I will check how it works in the guest OS and report back here.

                                      wyatt-madeW msupportM 2 Replies Last reply Reply Quote 4
                                      • wyatt-madeW Offline
                                        wyatt-made @splastunov
                                        last edited by

                                        @splastunov Will this be hampered by any licensing issues? To my understanding , NVIDIA vGPU requires a license per user per GPU to work properly. Unless this isn't the case on Xen?

                                        splastunovS tjkreidlT 2 Replies Last reply Reply Quote 0
                                        • splastunovS Offline
                                          splastunov @wyatt-made
                                          last edited by

                                          @wyatt-made I need few days to test it.
                                          Will report later here

                                          1 Reply Last reply Reply Quote 1
                                          • tjkreidlT Offline
                                            tjkreidl Ambassador @wyatt-made
                                            last edited by

                                            @wyatt-made Yeah, you need not only licenses for the hosts and any VMs running on them, but also have to run a custom NVIDIA license manager.

                                            1 Reply Last reply Reply Quote 0
                                            • olivierlambertO Offline
                                              olivierlambert Vates πŸͺ Co-Founder CEO
                                              last edited by

                                              Mediated devices will be a game changer… Eager to show our results with DPU, that will be the start of it. Some reading on the potential: https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/

                                              1 Reply Last reply Reply Quote 0
                                              • First post
                                                Last post