Nvidia Quadro P400 not working on Ubuntu server via GPU/PCIe passthrough
That's the information I need indeed (to be able to know if it's a hardware problem vs software)
I reinstalled XCP-NG on a different machine (self build server), and have passedthrough the Quadro P400 to my Plex VM.
The Kernel driver is installed like shown here:
Only when executing nvidia-smi, this happens:
The PCI device has been passed through with succes and everything went fine upon here. Still wondering what the issue could be here... As I did the same on Proxmox without any modifications.
I also followed this tutorial:
Which is from Craftcomputing (credits to him).
The only thing we dont do from the tutorial which has to be done on proxmox is the last step '-Hide VM identifiers from nVidia-'.
Well after some more debugging, this came back:
[ 3.869646] [drm] [nvidia-drm] [GPU ID 0x00000006] Loading driver [ 3.869648] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:06.0 on minor 1 [ 3.875046] ppdev: user-space parallel port driver [ 3.915702] Adding 4194300k swap on /swap.img. Priority:-2 extents:5 across:4620284k SSFS [ 4.284521] Decoding supported only on Scalable MCA processors. [ 4.439623] Decoding supported only on Scalable MCA processors. [ 4.499267] Decoding supported only on Scalable MCA processors. [ 8.103242] snd_hda_intel 0000:00:05.0: azx_get_response timeout, switching to polling mode: last cmd=0x004f0015 [ 8.355892] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:05.0/sound/card0/input6 [ 8.355940] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:05.0/sound/card0/input7 [ 8.355978] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:05.0/sound/card0/input8 [ 8.356014] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:05.0/sound/card0/input9 [ 8.356049] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:05.0/sound/card0/input10 [ 8.356085] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:05.0/sound/card0/input11 [ 8.427430] alua: device handler registered [ 8.428989] emc: device handler registered [ 8.430986] rdac: device handler registered [ 8.506988] EXT4-fs (xvda2): mounted filesystem with ordered data mode. Opts: (null) [ 25.789595] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667) [ 25.789622] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0 [ 25.794920] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667) [ 25.794943] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0 [ 38.508573] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667) [ 38.508614] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0 [ 38.514513] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667) [ 38.514542] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0 [ 1254.128346] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667) [ 1254.128378] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0 [ 1254.134808] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x22:0x56:667) [ 1254.134834] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
I think something goes wrong between the VM and XCP-NG, maybe the hiding of the GPU to DOM0?
This topic on the forum has the same issue: https://xcp-ng.org/forum/topic/4406/is-nvidia-now-allows-geforce-gpu-pass-through-for-windows-vms-on-linux/7?_=1640695038223
I also made a thread on the Nvidia forums to see if they know anything:
FYI, I just ordered a card for our lab, so we'll be able to try to reproduce the issue on our side
@olivierlambert Great let me know how it goes! As I have bought an WX2100 but unfortunately this one cannot be used for transcoding in plex... So I have to get back to Proxmox again with the Quadro P400.
Now back on Proxmox, although I also had some RMINIT errors on here, but these were related to the Hypervisor which were resolved pretty quick.
Also on Proxmox I have to change some Grub parameters and such, isn't this something that has to be done on Xen as well? And then on hypervisor level?
Like what parameter exactly? I think Xen doesn't support yet hiding the hypervisor information.
Parameters such as these in /etc/default/grub:
GRUB_CMDLINE_LINUX="textonly video=astdrmfb video=efifb:off"
Also someone replied to the topic I created on Nvidia forums:
That answer is incorrect. Passing an entire PCIe device shouldn't make a diff.
Maybe it's a problem on the IOMMU side, I don't know. It will be easier to work on it with an actual card.
@olivierlambert In that case, lets hope you can resolve it once the P400 is delivered. Curious what you find and how there is a way to resolve the issue...
I have no idea and not great hope due to our other priorities but we'll see.
@olivierlambert I think the conclusion we can make is that I need to hide the Hypervisor from the Nvidia driver which is also mentioned here: https://www.reddit.com/r/XenServer/comments/r12p0q/pci_passthrough_quadro_p400_to_ubuntucentos_vm/
So it is an Error 43, as I think this is plausible as for Proxmox I do this as well by adding this into the vm .conf file
Which hides the hypervisor from the VM in KVM perspective.
Is there an equivalent for XCP-NG?
- No equivalent in Xen yet
- Nvidia changed its policty recently to avoid blocking virt in their drivers. So the problem should not be here.
@olivierlambert Could it be something with the 'VFIO' modules maybe in KVM? I honestly have no clue anymore... So I think my best guess is to wait your research on this out...
I just got the card, but my agenda is very very busy ATM. I'll try to do the PCI passthrough on my spare time (which is not very often either)
@olivierlambert No problem take your time, I will check in regularly to see if there has been an update or some sorts...
Okay so doing some tests now, I can reproduce the issue. So the questions are:
- Did it work before? (older versions of XCP-ng?) -> removing regressions from the equation
- Is P400 limited for PCI passthough by NV driver? It's still not clear. If it's the problem, this require a code change in upstream Xen to be able to hide it.
I have no idea if it worked on the earlier versions of XCP-NG, I don't think so as I have I believe tested this on XCP-NG 8.0 and 8.1 if I remember correctly. (Also created forum posts about this in 2019/2020).
I dont think it is limited for PCI passthrough as I am using an NVidia driver on the VM within proxmox without any issues.
I am pasting a screenshot of the Nvidia driver I am currently using on the VM inside proxmox:
That might be because Proxmox is hiding the hypervisor underneath. Hard to tell because of this fracking drivers
Hmm yeah its quite a hassle with these drivers for some reason.... If you need some extra information which could help let me know... I can send some other details which are now on the proxmox host and the Ubuntu VM?
How about the VFIO modules which I also mentioned earlier? Is this something that has to be added to XCP-NG maybe? As I also have a topic on Reddit and this person has the same problem but then with an T400.
If it's hypervisor detection, the "only" thing needed is a Xen modification, but this is not trivial (if it's really that). I can assume it's the case.
In the meantime, can you double check if XCP-ng 7.6 is affected too? (last hope to check if it's not a regression).