Re: nVidia Tesla P4 for vgpu and Plex encoding
Thanks for various guides I was able to install the driver on xcp-ng 8.3
I've used NVIDIA-vGPU-CitrixHypervisor-8.2-570.124.03.x86_64.iso
and vgpu-7.4.16-1.xs8.x86_64.rpm (from xen server)
The driver seems to work and T4 is detected
# nvidia-smi
Thu Apr 3 17:18:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03 Driver Version: 570.124.03 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:02:00.0 Off | 0 |
| N/A 86C P0 42W / 70W | 13MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Unfortunately the card doesn't seem to be in vGPU mode:
# nvidia-smi vgpu -q
No supported devices in vGPU mode
I'm not sure if "Addressing Mode: Unknown Error" is anything to be concerned about, cannot find anything specific about that.
#nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Thu Apr 3 16:53:35 2025
Driver Version : 570.124.03
CUDA Version : Not Found
Attached GPUs : 1
GPU 00000000:02:00.0
Product Name : Tesla T4
Product Brand : NVIDIA
Product Architecture : Turing
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
Addressing Mode : Unknown Error
...
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
vGPU Heterogeneous Mode : N/A
...
I also see that vGPU / sr-iov (as least theoretically) supported:
# lspci -v -s 02:00.0
02:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Subsystem: NVIDIA Corporation Device 12a2
Physical Slot: 6
Flags: bus master, fast devsel, latency 0, IRQ 32
Memory at f8000000 (32-bit, non-prefetchable) [size=16M]
Memory at 383fc0000000 (64-bit, prefetchable) [size=256M]
Memory at 383ff0000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] #00 [0080]
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [c8] MSI-X: Enable- Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Capabilities: [bb0] #15
Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: nvidia
Kernel modules: nvidia
I do have SR-IOV enabled in BIOS, but having it disabled didn't seem to change anything.
Those with nvidia vGPUs, how have you ended up getting xcp-ng to enable the feature?