@olivierlambert Haha yes yes and I thank him for it But I will be back on Linux once all of this has been resolved on Nvidia's side.
Atleast I can migrate everything to XCP-NG now, so I will be accessing the Xen Eco system
@olivierlambert Haha yes yes and I thank him for it But I will be back on Linux once all of this has been resolved on Nvidia's side.
Atleast I can migrate everything to XCP-NG now, so I will be accessing the Xen Eco system
@Pyroteq Currently I am running my plex server via TrueNAS scale with HW transcoding. So I don't need it with XCP-NG anymore... But for the people who do need it, this can be useful to them.
I believe we have something here... In May 2022 Nvidia announced their open source Linux drivers.
As far as I can read, this is not much but a start.
I found this article: https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/
Maybe this gives us (the consumer) and the XCP-NG team more opportunities to make more use of Nvidia GPU's?
@olivierlambert Ahh thats a real bummer indeed... Lets hope Nvidia makes the same happen for Linux as it did for Windows, well actually they should otherwise they have a preference of OS so to speak
@olivierlambert But still if Nvidia has no plans on doing the same for Linux in what it did for Windows, how will XCP-NG/Xen project react to this? Will there be a possibility for a Kernel parameter for Xen to just 'hide' the hypervisor of some sort?
@olivierlambert Haha yes yes and I thank him for it But I will be back on Linux once all of this has been resolved on Nvidia's side.
Atleast I can migrate everything to XCP-NG now, so I will be accessing the Xen Eco system
Well I just installed Windows Server 2022 with the Quadro P400 and everything works fine now... So I believe for the time being I will keep Plex on Windows for now even if this has a larger footprint than Linux...
@warriorcookie I believe they still have a heavy heart when Linus told 'F You NVidia' Also I think the Quadro M4000 should not have any problems with passing through as this one uses other drivers than the Quadro P400 I also have. Otherwise all Quadro cards would be unusable with Xen and the Xen footprint would be a lot less I believe....
The Driver does seem te work as I sometimes get information from nvidia-smi (when freshly installed or on occasion).
The GPU on the other hand just falls off the bus for some strange reason (also see message above about this).
@olivierlambert @XCP-ng-JustGreat For some reason I got the M4000 working again in XCP-NG
Lets hope it does not leave me again
EDIT:
Well I tested something in a case of an emergency reboot where I rebooted the whole host, the card has fallen of the bus and nvidia-smi does not work anymore.
[ 19.111279] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x25:0x40:1250)
[ 19.111325] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 20.089528] NVRM: GPU 0000:00:05.0: GPU has fallen off the bus.
[ 20.292170] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 20.495046] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 20.495093] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 21.468090] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x22:0x56:667)
[ 21.468109] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 22.077205] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x22:0x56:667)
[ 22.077257] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 22.686023] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x22:0x56:667)
[ 22.686061] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 23.294321] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x22:0x56:667)
[ 23.294381] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 24.479743] rfkill: input handler disabled
[ 30.633389] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x22:0x56:667)
[ 30.633444] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[ 31.242314] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x22:0x56:667)
[ 31.242353] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
I believe this has nothing to do with code 43?
And it lets my whole XCP-NG host crash when trying to reboot the plex vm again...
So I assume the card works, but falls off the bus for some reason and I have no idea why?
EDIT 2:
Just reinstalled a new Ubuntu 21.10, and there it works as well.
Also after reboot of the Ubuntu 21.10, this keeps working:
but when I reboot the host itself, I get this:
And the GPU falls off the bus, resulting in this error:
[ 24.428255] loop3: detected capacity change from 0 to 8
[ 42.582836] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x25:0x40:1250)
[ 42.582888] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
[ 43.562201] NVRM: GPU 0000:00:06.0: GPU has fallen off the bus.
[ 43.764639] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 43.968873] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 43.968930] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
So I believe something happens on the XCP-NG side in this matter?
EDIT 3:
Looking further in the logs I find this: [ 10.315135] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000006] Failed to allocate NvKmsKapiDevice
[ 10.315943] [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000006] Failed to register device
EDIT 4: So sometimes it works and sometimes it doesnt, it goes very sporadically. So now I have the feeling that it could be an error on XCP-NG's side but I am not 100% sure. As the card itself works apparently, but I also have the feeling this could be an power consumption issue as I use an 2x Molex to 6 Pin PCI-E converter for this graphics card.
(Sorry for the long message)...
It is strange as I had it working this afternoon, on a different VM.... But shouldnt Quadro cards not have these particular issues (code 43), as they are 'Quadro'?
@xcp-ng-justgreat Unfortunately this did not work on Ubuntu 20.04.3 LTS ..... Even my recently bought Quadro M4000 won't work...
I am now getting this error:
[ 50.123425] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x25:0x40:1250)
[ 50.123475] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0
[ 51.102646] NVRM: GPU 0000:00:06.0: GPU has fallen off the bus.
[ 51.304888] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 51.507717] NVRM: GPU 0000:00:06.0: RmInitAdapter failed! (0x24:0xffff:1220)
[ 51.507755] NVRM: GPU 0000:00:06.0: rm_init_adapter failed, device minor number 0