GPU support and Nvidia Grid vGPU
-
@olivierlambert Do you think it's possible for that support to be added, or are there proprietary code limitations that prevent it?
What would it take to solve that limitation?We have currently delivered a project which which uses vmware, and the current temporary licenses end in September (could probably be extended a little).
In our cluster we have a total of 10x Nvidia A2, and 3x Nvidia L40 cards.
I would love to move to xcp-ng, as it really seems perfect for us, but the lack of Nvidia vgpu support stops us. -
@EspenU
I have completed the migration step from Vmware to XCP-NG. I am using Nvidia M10 and Nvidia A16. I had to invest some time to get the Nvidia cards up and running. What I can reveal is that so far they do not work with xcp-ng release candidate 1 -
@msupport said in GPU support and Nvidia Grid vGPU:
@EspenU
I have completed the migration step from Vmware to XCP-NG. I am using Nvidia M10 and Nvidia A16. I had to invest some time to get the Nvidia cards up and running. What I can reveal is that so far they do not work with xcp-ng release candidate 1I've seen your post from March 14. where you got it working on 8.3.
So you're saying that something has broken between then and RC1?
That's not good .
I have it running in a test environment on 8.2 and since it was working fine there I was hoping it would still work in 8.3. sigh... -
8.3 isn't even officially released I don't see any reason why it would break between beta 2 and RC1, but if you can confirm it, we could investigate. There's no rush to get out of 8.2, it's still an LTS
-
@EspenU
#*** Here are my insights for the nvidia drivers and xcp-ng 8.3
#*** Install xcp-ng beta2
#*** then start updatesyum update kernel device-mapper guest-templates-json guest-templates-json-data-linux intel-microcode openssh python2-scapy amd-microcode cisco* libblkid libcgroup libcgroup-tools libcurl libmount libuuid curl util-linux edk2 forkexecd fuse-libs gdisk guest-templates-json-data-other guest-templates-json-data-windows intel-ice python-fasteners python2-defusedxml python2-xapi-storage nss-sysinit nss-tools nspr nss nss-softokn nss-softokn-freebl nss-tools nss-util kernel-livepatch logrotate mellanox-mlnxen message-switch openssl* openvswitch swtpm* tzdata microsemi-smartpqi vendor-drivers sudo newt qlogic-qla2xxx qlogic-fastlinq gpumon sm vhd-tool vcputune
*** Nvidia Driver must install before this operation below, because the driver installation is not compatible with phython3 ****
yum remove xcp-python-libs-2.3.5-1.1.xcpng8.3.noarch yum update ncurses-compat-libs python3-fasteners python3-pyudev python3-scapy python3-xcp-libs python36-future yum update net-snmp yum update net-snmp-agent-libs net-snmp-libs yum update xapi-storage-script yum update xs-openssl-libs yum update xapi-nbd yum remove net-snmp yum net-snmp-libs net-snmp-agent-libs net-snmp yum update xcp-ng-plymouth-theme yum update xen-crashdump-analyser
*** Driver don't work anymore with one of these 40 Updates,because the updates are dependent:
Name Description Version Release Size blktap blktap user space utilities 3.54.9 1.1.xcpng8.3 305.47 KiB kexec-tools kexec/kdump userspace tools 2.0.15 20.xcpng8.3 67.8 KiB ncurses Ncurses support utilities 6.4 3.xcpng8.3 394.83 KiB ncurses-base Descriptions of common terminals 6.4 3.xcpng8.3 57.81 KiB ncurses-libs Ncurses libraries 6.4 3.xcpng8.3 312.17 KiB qemu qemu-dm device model 4.2.1 5.2.9.xcpng8.3 15.57 MiB rrdd-plugins RRDD metrics plugin 24.16.0 1.2.xcpng8.3 4.29 MiB setup A set of system configuration and setup files 2.8.71 9.1.xcpng8.3 169.24 KiB sm-cli CLI for xapi toolstack storage managers 24.16.0 1.2.xcpng8.3 1.53 MiB squeezed Memory ballooning daemon for the xapi toolstack 24.16.0 1.2.xcpng8.3 1.54 MiB varstored EFI Variable Storage Daemon 1.2.0 2.3.xcpng8.3 46.55 KiB varstored-guard Deprivileged XAPI socket Daemon for EFI variable storage 24.16.0 1.2.xcpng8.3 4.3 MiB varstored-tools Tools for manipulating a guest's EFI variables offline 1.2.0 2.3.xcpng8.3 58.66 KiB vncterm vncterm tty to vnc utility 10.2.1 2.xcpng8.3 43.94 KiB wsproxy Websockets proxy for VNC traffic 24.16.0 1.2.xcpng8.3 932.78 KiB xapi-core The xapi toolstack 24.16.0 1.2.xcpng8.3 24.55 MiB xapi-rrd2csv A tool to output RRD values in CSV format 24.16.0 1.2.xcpng8.3 2.61 MiB xapi-tests Toolstack test programs 24.16.0 1.2.xcpng8.3 6.25 MiB xapi-xe The xapi toolstack CLI 24.16.0 1.2.xcpng8.3 1.13 MiB xcp-clipboardd Daemon to share a virtualized Windows clipboard 1.0.3 8.xcpng8.3 22.53 KiB xcp-featured XCP-ng feature daemon 1.1.7 2.xcpng8.3 1.25 MiB xcp-networkd Simple host network management service for the xapi toolstack 24.16.0 1.2.xcpng8.3 4.15 MiB xcp-ng-release XCP-ng release file 8.3.0 24 112.56 KiB xcp-ng-release-config XCP-ng configuration 8.3.0 24 49.93 KiB xcp-ng-release-presets XCP-ng presets file 8.3.0 24 18.44 KiB xcp-ng-xapi-plugins XAPI additional plugins for XCP-ng 1.10.0 1.xcpng8.3 46.17 KiB xcp-rrdd Statistics gathering daemon for the xapi toolstack 24.16.0 1.2.xcpng8.3 3.14 MiB xen-dom0-libs Xen Hypervisor Domain 0 libraries 4.17.4 3.xcpng8.3 691.85 KiB xen-dom0-tools Xen Hypervisor Domain 0 tools 4.17.4 3.xcpng8.3 1.9 MiB xen-hypervisor The Xen Hypervisor 4.17.4 3.xcpng8.3 2.34 MiB xen-libs Xen Hypervisor general libraries 4.17.4 3.xcpng8.3 54.05 KiB xen-livepatch Live patches for Xen 2.0 1.xcpng8.3 2.91 KiB xen-tools Xen Hypervisor general tools 4.17.4 3.xcpng8.3 35.66 KiB xenopsd Simple VM manager 24.16.0 1.2.xcpng8.3 1.17 MiB xenopsd-cli CLI for xenopsd, the xapi toolstack domain manager 24.16.0 1.2.xcpng8.3 1.61 MiB xenopsd-xc Xenopsd using xc 24.16.0 1.2.xcpng8.3 4.61 MiB xenserver-hwdata Additional hardware identification and configuration data 20240411 1.xcpng8.3 284.41 KiB xenserver-status-report A program that generates status reports for a XenServer host 2.0.3 1.xcpng8.3 33.24 KiB xo-lite Xen Orchestra Lite 0.2.3 1.xcpng8.3 816.19 KiB xsconsole XCP-ng Host Configuration Console 11.0.2 1.1.xcpng8.3 304.44 KiB
-
Have you took the drivers for XS8, which is equivalent from XCP-ng 8.3?
-
@olivierlambert Would that mean that the vgpu binary should be taken from XS8 as well when using XCP-ng 8.3?
During testing I tried using that binary in XCP-ng 8.2, and it didn't work (VMs would no boot). I had to use the one from Citrix Hypervisor 8.2. -
@olivierlambert
I have tested the Nvidia XenServer version 17.0. -
I don't know all the versions, but I can tell that:
- XCP-ng 8.2 == XS 8.2
- XCP-ng 8.3 == XS 8
So be sure to use the right/matching binary first
-
@olivierlambert
I have found the solution. I will test the whole thing again tomorrow with a clean installation with rc1. -
Oh great! Keep us posted!!
-
@msupport Please write up all the steps involved, as this would be very useful documentation for anyone else wanting to accomplish this. Many have delayed switching to XCP-ng because of not being able to make use of NVIDIA GPUs.
-
It works on 8.2 already even if it's not official at all
-
Installation instructions XCP-NG (RC1) Nvidia M10 | A16 GPU
- install XCP-NG 8.3 RC1
- download XenServer Driver Nvidia 17.1 (NVIDIA-GRID-XenServer-8-550.54.16-550.54.15-551.78)
- unzip driver and copy host driver (NVIDIA-vGPU-xenserver-8-550.54.16.x86_64.iso) I used winscp to copy the driver to the tmp directory.
- download XenServer iso file (https://www.xenserver.com/downloads | XenServer8_2024-06-03.iso)
- copy the file (vgpu-7.4.13-1.xs8.x86_64.rpm) in the packages directory ! Do not use CitrixHypervisor-8.2.0-install-cd file vgpu-7.4.8-1.x86_64
- unpack file vgpu-7.4.13-1.xs8.x86_64
- copy the file \usr\lib64\xen\bin\vgpu (size 129KB) to \usr\lib64\xen\bin\ on your XCP-NG host (chmod 755)
- (putty) /tmp/ xe-install-supplemental-pack NVIDIA-vGPU-xenserver-8-550.54.16.x86_64.iso
- reboot
- install guest driver on the VM client (551.78_grid_win10_win11_server2022_dch_64bit_international.exe)
- token file from Nvidia (C:\Program Files\Nvidia Corporation\vGPU Licensing\ClientConfigToken*.tok)
Nvidia drivers 17.2 and 17.3 do not work yet (Guest driver crashes)
I will stay tuned and inform you about new findingsHave fun
-
@olivierlambert
Thanks for the hint, that helped me a lot -
Thank you very much!
-
@msupport Many thanks for your write-up! Have you experienced any issues communicating with the NVIDIA license server?
-
I also used instructions from @msupport
https://xcp-ng.org/forum/topic/8987/vgpu-nvidia-tesla-p4-xcp-ng-8-3-beta-2?_=1721015408249
-
@tjkreidl
Nvidia licence server works perfectly so far -
i can't find where to get this new nvidia driver. Tesla V100.
upd
looks the only way is license portal https://nvid.nvidia.com/. sad.