nVidia Tesla P4 for vgpu and Plex encoding

wyatt-made

@Dani I made a post yesterday about Nvidia MiG support. vGPU support on XCP-ng is tricky because of the proprietary code bit that makes vGPU work can't be freely distributed. On the other hand, MiG (which is supported by many Ampere cards like to A100) doesn't requiring licensing like vGPU and seemingly just creates PCI addresses for the card which could, in theory, be passed through to VMs.

CC: @olivierlambert since we briefly talked about this yesterday in my thread.

Dani

@wyatt-made thanks a lot. What a qick response!!
I'll check your post.
My plan is to install xcp in the A100 server Next week and test It. I will post the results in the forum and maybe will help @olivierlambert and the rest of the community.

Dani

olivierlambert

I will be interested to understand how MiG works and if we are far or not to get a solution for it

hani

@splastunov is it still not asking for license ?

splastunov

@hani it began asking for license after one day but without throttling.

I have switched to AMD GPUs

hani

@splastunov Thanks thats expected

msupport

@splastunov
Thanks a lot.
This also works with the Nvidia M10 and Nvidia A16 graphics card.
Driver Version: NVIDIA-vGPU-CitrixHypervisor-8.2-550.54.10.x86_64 (Version 17.0)

olivierlambert

@msupport can you detail a bit what you did? thanks!

msupport

@olivierlambert

Install XCP-NG Version 8.2.1 (8.3 did not work)
Install all update yum update
reboot
Download NVIDIA vGPU drivers for XenServer 8.2 from NVIDIA site. Version NVIDIA-vGPU-CitrixHypervisor-8.2-550.54.10.x86_64 (Version 17.0)
Unzip and install rpm from Host-Drivers
reboot again
Download free CitrixHypervisor-8.2.0-install-cd.iso from Citrix site
Open CitrixHypervisor-8.2.0-install-cd.iso with 7-zip, then unzip vgpu binary file from Packages->vgpu....rpm->vgpu....cpio->.->usr->lib64->xen->bin
9.Upload vgpu to XCP-ng host to /usr/lib64/xen/bin and made it executable chmod +x /usr/lib64/xen/bin/vgpu
Deployed VM with vGPU and it started without any problems
Copy License File from Nvidia License Portal (*.tok) to C:\program files\Nvidia Corperation\vGPU Licensing\ClientConfigToken
Install Windows Nvidia Driver on Windows 10 VM (need connection to api.dis.licensing.nvidia.com Port TCP 443, use Nvidia Control Panel to set the hostname and port for Licensing the Nvidia Card)

Works fine for me

If you want to install the NVidia driver on XCP-NG 8.3. Manipulate /etc/xensource-inventory (line PRODUCT_VERSION from 8.3.0 to 8.2.0) during the installation. Then xe-install-supplental-pack NVIDIA-vGPU-CitrixHypervisor-8.2-550.54.16.x86_64.iso.
After installation, change PRODUCT_VERSION back to 8.3.0
The driver now also works in version XCP-NG 8.3
Do not forget to copy the vgpu file to /usr/lib64/xen/bin/vgpu. (change the chmod to 755)

Nvidia vGPU M10 | A16 on XCP-NG 8.3 Beta2 only work without XCP-NG updates. After the update, the error message "An emulator required to run this VM failed to start" appears. It must be due to one of the 76 updates that can be installed. I am trying to find out which update is causing this problem.

22.07.2024 [NEW]

Installation XCP-NG RC1 Nvidia 17.1 GPU

install XCP-NG 8.3 RC1
download XenServer Driver Nvidia 17.1 (NVIDIA-GRID-XenServer-8-550.54.16-550.54.15-551.78)
unzip driver and copy host driver (NVIDIA-vGPU-xenserver-8-550.54.16.x86_64.iso) I used winscp to copy the driver to the tmp directory.
download XenServer iso file (https://www.xenserver.com/downloads | XenServer8_2024-06-03.iso)
copy the file (vgpu-7.4.13-1.xs8.x86_64.rpm) in the packages directory ! Do not use CitrixHypervisor-8.2.0-install-cd file vgpu-7.4.8-1.x86_64
unpack file vgpu-7.4.13-1.xs8.x86_64
copy the file \usr\lib64\xen\bin\vgpu (size 129KB) to \usr\lib64\xen\bin\ on your XCP-NG host (chmod 755)
(putty) /tmp/ xe-install-supplemental-pack NVIDIA-vGPU-xenserver-8-550.54.16.x86_64.iso
reboot
install guest driver on the VM client (551.78_grid_win10_win11_server2022_dch_64bit_international.exe)
token file from Nvidia (C:\Program Files\Nvidia Corporation\vGPU Licensing\ClientConfigToken*.tok)
Nvidia drivers 17.2 and 17.3 do not work yet (Guest driver crashes) Test with Windows 11 23H2

My environment:
16x Hosts HPE DL380
6x Hosts HPE DL380 with vGPU Nvidia M10 and A16
5x HPE 3PAR Storage and 1x HPE MSA 2050 Storage
2x 96 port fibre channel switch

I have migrated from Vmware to XCP-NG with XOA.

olivierlambert

Okay so the only binary you need to get from Citrix is vgpu, right?

austinw

Out of curiosity what are you using this for? what are these VM's doing?

msupport

@olivierlambert
yes, download and extract. Don't forget the permission.

Download free CitrixHypervisor-8.2.0-install-cd.iso from Citrix site

Open CitrixHypervisor-8.2.0-install-cd.iso with 7-zip, then unzip vgpu binary file from Packages->vgpu....rpm->vgpu....cpio->.->usr->lib64->xen->bin

Upload vgpu to XCP-ng host to /usr/lib64/xen/bin and made it executable chmod +x /usr/lib64/xen/bin/vgpu

msupport

@austinw
I use these Windows 10 clients with UDS Enterprise VDI System. From October 2025, it will be forbidden to run Office on the terminal server. That's why we switched to virtual desktops.

https://learn.microsoft.com/de-de/deployoffice/endofsupport/windows-server-support

austinw

@splastunov Do the AMD GPU's not require a license?

mohammadm

@austinw said in nVidia Tesla P4 for vgpu and Plex encoding:

@splastunov Do the AMD GPU's not require a license?

Nope. These work easy out of the box. Installed the GPU on one of our servers yesterday.

splastunov

@austinw no licenses, but a lot of troubles.....

mohammadm

@splastunov said in nVidia Tesla P4 for vgpu and Plex encoding:

@austinw no licenses, but a lot of troubles.....

Curious, what troubles?

splastunov

@mohammadm
I'm talking now about vGPU not passthrough

old drivers
no way to monitor GPU load
Sometimes the GPU on Dom0 stops responding and the only thing that can be done to solve this problem is to reboot the entire server with all the virtual machines on it.
and etc.... do not remember all troubles I had with it

mohammadm

@splastunov said in nVidia Tesla P4 for vgpu and Plex encoding:

@mohammadm
I'm talking now about vGPU not passthrough

old drivers

no way to monitor GPU load

Sometimes the GPU on Dom0 stops responding and the only thing that can be done to solve this problem is to reboot the entire server with all the virtual machines on it.
and etc.... do not remember all troubles I had with it

I installed the Firepro S7150x2 yesterday without any issues. It's been about 24 hours, so far no issues. I do agree I am missing the nvidia-smi command to get a better overview.

Why is the support regarding vGPU so bad and mostly outdated

olivierlambert

I will have the opportunity to discuss more with AMD (on a regular basis, for some reasons), I'll try to see if I can connect to their GPU division