nVidia Tesla P4 for vgpu and Plex encoding
-
Does anyone have experience with nVidia Tesla P4 with XCP-ng with XO and Plex Media Server running in a VM with transcoding enabled.
-
@high-voltages thanks so much!! I'm also testing with M10 cards, we are trying to find VMware alternatives and are a large vGPU shop
here is what i'm currently running
xe host-param-get param-name=software-version uuid=$(xe host-list --minimal)
product_version: 8.2.1; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.1; product_brand: XCP-ng; build_number: release/yangtze/master/58; hostname: localhost; date: 2024-07-17; dbv: 0.0.1; xapi: 1.20; xen: 4.13.5-9.44; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.603i installed the nvidia host driver using rpm -iv "nvidia.rpm ive also tried other methods copying the .iso and installing the supplemental pack. One thing to mention is i grabbed the binaries from XenServer8_2024-10-03
-
Hello!
I have found information that under proxmox it is possible to run NVIDA vGPU.
Here are some links.
Maybe it is possible to adopt it to xcp-ng?https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE_7.x
https://github.com/DualCoder/vgpu_unlock
https://gitlab.com/polloloco/vgpu-proxmoxI have server with NVIDIA Tesla T4 for test.
I will try to make it work and report later. -
Proxmox and XCP-ng are 2 very different platforms.
But yes, in XCP-ng, you can PCI passthrouh the card to the VM.
-
The PCI passthrough is not interesting for the GPU.
xcp-ng currently only has one compatible AMD Firepro s7150X2 GPU.
This is not enough for modern tasks -
I have tried to install official vGPU drivers from NVIDIA for xenserver 8.2 and now I can see list of vGPUs in xencenter
-
So are you talking about vGPU? If yes, it wasn't clear.
-
Yes, I'm trying to build server with xcp-ng and some "modern" GPU. I need VMs with GPU (vGPU or MxGPU).
It is impossible to have 20-30 "hardware" GPUs in one server to passthrough them to VMs.So, I have installed vGPU driver from NVIDIA, but VMs not starting with error "An emulator required to run this VM failed to start".
Is it because there is some proprietary piece of code in XenServer? -
Yes, some parts we can't redistribute publicly, but which are present in CH ISO.
-
I've been digging around vGPU for days.
I found information that you don't need any additional binaries to make it work. All you need is Nvidia vGPU drivers.
So I tried installing different versions of Nvidia vGPU drivers for XenServer on xcp-ng.
There were no errors during installation.
After installing the drivers I was able to see all vGPU types in XenCenter,nvidia-smi
gives me the correct output.
I also checked xensource.log and this is what I found.Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:7e239728-ac74-4323-b8d0-7c40fa318411 Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:6448a86a-ee72-492a-b700-83d1645f0c60 Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:7e239728-ac74-4323-b8d0-7c40fa318411 Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:6448a86a-ee72-492a-b700-83d1645f0c60 Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|vgpuops] vGPUs allocated to VM (OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee) are: OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511 Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|vgpuops] Creating virtual VGPUs Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:7e239728-ac74-4323-b8d0-7c40fa318411 Dec 1 00:53:25 XEN60 xapi: [debug||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_gpumon] assert_vgpu_pgpu_are_compatible: vGPU/pGPU are compatible by default OpaqueRef:f8b54a1f-8f3c-4ba5-a475-980b4e2af511/OpaqueRef:6448a86a-ee72-492a-b700-83d1645f0c60 Dec 1 00:53:25 XEN60 xapi: [ info||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xenops] xenops: VM.import_metadata {"vusbs":[],"vgpus":[{"implementation":["Nvidia",{"extra_args":"","uuid":"a50159e3-a755-0b90-19ea-d5697b007834","type_id":"224","virtual_pci_address":{"fn":0,"dev":11,"bus":0,"domain":0}}],"physical_pci_address":{"fn":0,"dev":0,"bus":193,"domain":0},"position":0,"id":["f565e3ab-2fc9-2d00-a184-e7f28ee91915","0"]}],"pcis":[],"vifs":[],"vbds":[{"persistent":true,"extra_private_keys":{},"extra_backend_keys":{"polling-duration":"1000","polling-idle-threshold":"50"},"unpluggable":true,"ty":"CDROM","mode":"ReadOnly","position":["Ide",3,0],"id":["f565e3ab-2fc9-2d00-a184-e7f28ee91915","xvdd"]},{"persistent":true,"extra_private_keys":{},"extra_backend_keys":{"polling-duration":"1000","polling-idle-threshold":"50"},"unpluggable":true,"ty":"Disk","backend":["VDI","b2d22c67-abe8-d411-ba19-b5aa046407e9/b3fc6e5b-7d44-4117-b614-a9c086be0cf7"],"mode":"ReadWrite","position":["Ide",0,0],"id":["f565e3ab-2fc9-2d00-a184-e7f28ee91915","xvda"]}],"vm":{"generation_id":"4573398145280631014:2680438277400141975","has_vendor_device":true,"pci_power_mgmt":false,"pci_msitranslate":false,"on_reboot":["Start"],"on_shutdown":["Shutdown"],"on_crash":["Start"],"scheduler_params":{"affinity":[],"priority":[256,0]},"vcpus":2,"vcpu_max":2,"memory_dynamic_min":2147483648,"memory_dynamic_max":2147483648,"memory_static_max":2147483648,"suppress_spurious_page_faults":false,"ty":["HVM",{"firmware":["Uefi",{"backend":"xapidb","on_boot":"Persist"}],"qemu_stubdom":false,"qemu_disk_cmdline":false,"boot_order":"cd","pci_passthrough":false,"pci_emulations":[],"serial":"pty","acpi":true,"video":"Vgpu","video_mib":16,"timeoffset":"0","shadow_multiplier":1.0,"hap":true}],"bios_strings":{"bios-vendor":"Xen","bios-version":"","system-manufacturer":"Xen","system-product-name":"HVM domU","system-version":"","system-serial-number":"","baseboard-manufacturer":"","baseboard-product-name":"","baseboard-version":"","baseboard-serial-number":"","baseboard-asset-tag":"","baseboard-location-in-chassis":"","enclosure-asset-tag":"","hp-rombios":"","oem-1":"Xen","oem-2":"MS_VM_CERT/SHA1/bdbeb6e0a816d43fa6d3fe8aaef04c2bad9d3e3d"},"platformdata":{"featureset":"178bfbff-f6d83203-2e500800-040001f7-0000000f-219c01a9-00400004-00000000-010cd005-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000","timeoffset":"0","usb":"true","usb_tablet":"true","device-model":"qemu-upstream-uefi","videoram":"8","hpet":"true","secureboot":"false","viridian_apic_assist":"true","apic":"true","device_id":"0002","cores-per-socket":"2","viridian_crash_ctl":"true","pae":"true","vga":"std","nx":"true","viridian_time_ref_count":"true","viridian_stimer":"true","viridian":"true","acpi":"1","viridian_reference_tsc":"true"},"xsdata":{"vm-data/mmio-hole-size":"268435456","vm-data":""},"ssidref":0,"name":"Windows Server 2022 (64-bit) (1)","id":"f565e3ab-2fc9-2d00-a184-e7f28ee91915"}} Dec 1 00:53:25 XEN60 xenopsd-xc: [debug||140 |Async.VM.start R:021195d15a31|xenops_utils] TypedTable: Writing VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0 Dec 1 00:53:25 XEN60 xenopsd-xc: [debug||36 |Async.VM.start R:021195d15a31|xenguesthelper] connect: args = [ -mode hvm_build -image /usr/libexec/xen/boot/hvmloader -vgpu -domid 2 -store_port 3 -store_domid 0 -console_port 4 -console_domid 0 -mem_max_mib 2032 -mem_start_mib 2032 ] Dec 1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Device.Dm.start domid=2 args: [-vgpu -videoram 16 -vnc unix:/var/run/xen/vnc-2,lock-key-sync=off -acpi -monitor null -pidfile /var/run/xen/qemu-dm-2.pid -xen-domid 2 -m size=2032 -boot order=cd -usb -device usb-tablet,port=2 -smp 2,maxcpus=2 -serial pty -display none -nodefaults -trace enable=xen_platform_log -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -S -parallel null -qmp unix:/var/run/xen/qmp-libxl-2,server,nowait -qmp unix:/var/run/xen/qmp-event-2,server,nowait -device xen-platform,addr=3,device-id=0x0002 -drive file=,if=none,id=ide1-cd1,read-only=on -device ide-cd,drive=ide1-cd1,bus=ide.1,unit=1 -device nvme,serial=nvme0,id=nvme0,addr=7 -drive id=disk0,if=none,file=/dev/sm/backend/b2d22c67-abe8-d411-ba19-b5aa046407e9/b3fc6e5b-7d44-4117-b614-a9c086be0cf7,media=disk,auto-read-only=off,format=raw -device nvme-ns,drive=disk0,bus=nvme0,nsid=1 -device xen-pvdevice,device-id=0xc000,addr=6 -net none] Dec 1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Starting daemon: /usr/bin/vgpu with args [--domain=2; --vcpus=2; --suspend=/var/lib/xen/demu-save.2; --device=0000:c1:00.0,224,0000:00:0b.0,a50159e3-a755-0b90-19ea-d5697b007834] Dec 1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] vgpu: should be running in the background (stdout -> syslog); (fd,pid) = (FEFork (35,8823)) Dec 1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Daemon started: vgpu-2 Dec 1 00:53:56 XEN60 xenopsd-xc: [error||36 ||xenops] vgpu: unexpected exit with code: 127 Dec 1 00:53:56 XEN60 xenopsd-xc: [ info||36 ||xenops_server] Caught Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]]) executing ["VM_start",["f565e3ab-2fc9-2d00-a184-e7f28ee91915",false]]: triggering cleanup actions Dec 1 00:53:58 XEN60 xenopsd-xc: [error||36 ||task_server] Task 42 failed; Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]]) Dec 1 00:53:58 XEN60 xenopsd-xc: [debug||36 ||xenops_server] TASK.signal 42 = ["Failed",["Failed_to_start_emulator",["f565e3ab-2fc9-2d00-a184-e7f28ee91915","vgpu","Daemon exited unexpectedly"]]] Dec 1 00:53:58 XEN60 xapi: [ info||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xapi_network] Caught Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]]): detaching networks Dec 1 00:53:58 XEN60 xapi: [error||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xenops] Caught exception starting VM: Xenops_interface.Xenopsd_error([S(Failed_to_start_emulator);[S(f565e3ab-2fc9-2d00-a184-e7f28ee91915);S(vgpu);S(Daemon exited unexpectedly)]]) Dec 1 00:53:58 XEN60 xenopsd-xc: [debug||25 |org.xen.xapi.xenops.classic events D:22e2807de46a|xenops_utils] TypedTable: Removing VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0 Dec 1 00:53:58 XEN60 xenopsd-xc: [debug||25 |org.xen.xapi.xenops.classic events D:22e2807de46a|xenops_utils] TypedTable: Deleting VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0 Dec 1 00:53:58 XEN60 xenopsd-xc: [debug||25 |org.xen.xapi.xenops.classic events D:22e2807de46a|xenops_utils] DB.delete /var/run/nonpersistent/xenopsd/classic/VM/f565e3ab-2fc9-2d00-a184-e7f28ee91915/vgpu.0 Dec 1 00:53:58 XEN60 xapi: [error||952 HTTPS 192.168.8.103->|Async.VM.start R:021195d15a31|xenops] Re-raising as FAILED_TO_START_EMULATOR [ OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee; vgpu; Daemon exited unexpectedly ] Dec 1 00:53:58 XEN60 xapi: [error||952 ||backtrace] Async.VM.start R:021195d15a31 failed with exception Server_error(FAILED_TO_START_EMULATOR, [ OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee; vgpu; Daemon exited unexpectedly ]) Dec 1 00:53:58 XEN60 xapi: [error||952 ||backtrace] Raised Server_error(FAILED_TO_START_EMULATOR, [ OpaqueRef:7051bb05-712e-4dc3-bf6b-fef76c0980ee; vgpu; Daemon exited unexpectedly ])
I think that main is
Dec 1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] vgpu: should be running in the background (stdout -> syslog); (fd,pid) = (FEFork (35,8823)) Dec 1 00:53:26 XEN60 xenopsd-xc: [debug||36 ||xenops] Daemon started: vgpu-2 Dec 1 00:53:56 XEN60 xenopsd-xc: [error||36 ||xenops] vgpu: unexpected exit with code: 127
So it said that
Daemon started: vgpu-2
, but then failed withcode: 127
Are there any options to debug it? Any ideas why it failed to start?
-
That's because our version of emu-manager doesn't support vGPU. In short, Citrix decided in 2018 to introduce emu-manager and to make it closed source. Without it, you can't even boot a VM (read the story here, it's pretty "funny": https://bugs.xenserver.org/browse/XSO-878)
So we had to come with our own version without any clue on how it works. We managed to make something working, but obviously it took a lot of efforts and time, even without vGPU management.
In theory, you can use the emu-manager from Citrix ISO to replace our own, and that should do the trick.
-
I have tried to com emu-manager from different version of Citrix ISO, but nothing changed.
May I have to enable trial license, but it will be non production variant.So I will switch to AMD Firepro s7150X2 GPU
It would be great to expand the list of supported GPUs in the future.
NVIDIa drivers are available after registration and there are some triks to overcome NVIDIA vGPU license on opensource platforms. So we need "just" make some changes in emu-manager... Do you have source code if emu-manager that xpn-ng using? -
We had someone who managed to get Nvidia vGPU working recently, so it should work but I'm not confident to give all details publicly since it's not legal to redistribute or use proprietary packages
In my opinion, the future will be mediated devices, using VFIO or something. And good news: for our DPU work, we are working on an equivalent of VFIO for Xen. So the solution might come from there
-
@olivierlambert could you give me a contact, please? I will contact him in private conversation.
-
Finally after a week I found the solution!
There is no problem with emu-manager.
XCP does not contain necessary packagevgpu
.
I copiedvgpu
from Citrix ISO and now it is alive! : ) -
Ah great But I think our EMU manager won't work, do you confirm you are still using the Citrix one, right?
-
Steps I have done to make NVIDIA vGPU works:
- Install XCP-ng 8.2.1
- Install all update
yum update
reboot
- Download NVIDIA vGPU drivers for XenServer 8.2 from NVIDIA site. Version NVIDIA-GRID-CitrixHypervisor-8.2-510.108.03-513.91
- Unzip and install rpm from Host-Drivers
reboot
again- Download free CitrixHypervisor-8.2.0-install-cd.iso from Citrix site
- Open CitrixHypervisor-8.2.0-install-cd.iso with 7-zip, then unzip
vgpu
binary file from Packages->vgpu....rpm->vgpu....cpio->.->usr->lib64->xen->bin - Upload
vgpu
to XCP-ng host to/usr/lib64/xen/bin
and made it executablechmod +x /usr/lib64/xen/bin/vgpu
- Deployed VM with vGPU and it started without any problems
So I did not make any modifications with emu-manager.
My test server is far away from me and it will take some time to download the windows ISO to this test location. Then I will check how it works in the guest OS and report back here.
-
@splastunov Will this be hampered by any licensing issues? To my understanding , NVIDIA vGPU requires a license per user per GPU to work properly. Unless this isn't the case on Xen?
-
@wyatt-made I need few days to test it.
Will report later here -
@wyatt-made Yeah, you need not only licenses for the hosts and any VMs running on them, but also have to run a custom NVIDIA license manager.
-
Mediated devices will be a game changer⦠Eager to show our results with DPU, that will be the start of it. Some reading on the potential: https://arccompute.com/blog/libvfio-commodity-gpu-multiplexing/