Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work
-
@tuxen that didn't work either.
probably not so interesting but tried Xcp-ng 8.2.1 also same error there
-
@tuxen Got hold of custom bios with G4 available now. But disabling 4g didn't help.. It was hidden but always enabled
But see that in Proxmox the pci slot does not change. it stays at slot 1 with Xcp-ng it changes to slot 8.
Proxmox user uses hex pci id for gpu passthrough does this matter?
-
It shouldn't. Any idea here @Teddy-Astie ?
-
@steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:
@tuxen Got hold of custom bios with G4 available now. But disabling 4g didn't help.. It was hidden but always enabled
But see that in Proxmox the pci slot does not change. it stays at slot 1 with Xcp-ng it changes to slot 8.
Is this slot change in the guest or in XCP-ng itself ?
I would not be supprised to see it chaging in the guest (it's a part of what QEMU pci passthrough can do) but that would be weird for Dom0.Proxmox user uses hex pci id for gpu passthrough does this matter?
I don't think it's a guarantee that you see the same BDF in guest and host regarding in proxmox (more likely a coincidence). But that's a behavior that would change in XCP-ng with Q35/proper PCIe support.
Aside that, it "appears" to work in a Linux VM (albeit with no display), have you managed to make CUDA running there (to see if the device actually works) ? And if the BAR window message still exist.
-
@Teddy-Astie no only in guest.
Haven't tried anything more with pop os than trying to extend the screen.
can try to look at it with CUDA running -
@Teddy-Astie I have installed nvidia drivers and CUDA Toolkit. It seems like the drivers are running properly and I see that the pop os gpu card in popos but ran dmesg again and got an error
ation="profile_load" profile="unconfined" name="libreoffice-xpdfimport" pid=516 comm="apparmor_parser" [ 5.425026] Adding 4193784k swap on /dev/mapper/cryptswap. Priority:-2 extents:1 across:4193784k SS [ 5.427939] nvidia: module license 'NVIDIA' taints kernel. [ 5.427941] Disabling lock debugging due to kernel taint [ 5.427943] nvidia: module license taints kernel. [ 5.473953] nvidia-nvlink: Nvlink Core is being initialized, major device number 238 [ 5.475202] xen: --> pirq=24 -> irq=17 (gsi=17) [ 5.475401] nvidia 0000:00:08.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem [ 5.476296] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 560.35.03 Fri Aug 16 21:39:15 UTC 2024 [ 5.502521] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 560.35.03 Fri Aug 16 21:21:48 UTC 2024 [ 5.505900] [drm] [nvidia-drm] [GPU ID 0x00000008] Loading driver [ 5.829042] zram: Added device: zram0 [ 6.103268] kauditd_printk_skb: 18 callbacks suppressed [ 6.103273] audit: type=1400 audit(1731496933.838:29): apparmor="DENIED" operation="capable" class="cap" profile="/usr/sbin/cupsd" pid=671 comm="cupsd" capability=12 capname="net_admin" [ 6.288396] snd_hda_intel 0000:00:09.0: azx_get_response timeout, switching to polling mode: last cmd=0x000f0000 [ 6.385067] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:08.0 on minor 1 [ 6.422276] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. [ 6.452377] nvidia-uvm: Loaded the UVM driver, major device number 236. [ 6.844962] zram0: detected capacity change from 0 to 32735232 [ 7.215079] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:09.0/sound/card0/input6 [ 7.216764] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:09.0/sound/card0/input7 [ 7.217068] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:09.0/sound/card0/input8 [ 7.217363] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:09.0/sound/card0/input9 [ 7.332815] Adding 16367612k swap on /dev/zram0. Priority:1000 extents:1 across:16367612k SS [ 10.684565] rfkill: input handler disabled [ 20.569595] rfkill: input handler enabled [ 23.259676] rfkill: input handler disabled [ 23.572353] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000008] Failed to grab modeset ownership steff@pop-os:~$ lspci | grep VGA 00:02.0 VGA compatible controller: Device 1234:1111 (rev 02) 00:08.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
-
@steff22 This was a known bug that had no significance.
((The warning message is expected. When a client (such as the modesetting driver) attempts to open our DRM device node while modesetting permission is already acquired by something else (like the NVIDIA X driver), it has to fail, but the kernel wonβt let us return a failure after v5.9-rc1, so we print this message. It wonβt impact functionality of the NVIDIA X driver that already has modesetting permission. Safe to ignore as long as long as you didnβt need the other client to actually get modesetting permission. If you want to suppress the error, you would need to find which client is attempting to open the NVIDIA DRM device node and prevent it from doing so.))
-
@steff22 I have some questions:
- Is the host being powered up with a monitor or a dummy plug (headless) already attached to the dGPU?
- Without rebooting the VM and right after the driver installation succeeds (showing that the device is working OK), what happens if you click the
[Detect]
button at the display settings window? - Instead of a reboot, did you try a VM shutdown/start cycle for the 1st time after the driver installation?
Nonetheless, if the same dGPU card works normally on another XCP-ng host, a possible Xen passthrough incompatibility with that AM5 board should be considered. For example:
- CSM/UEFI GPU compat issues, as referenced by @Teddy-Astie
- Beta BIOS/IOMMU broken or lacking features (eg. ACS support for PCI/PCIe isolation)
Tux
-
@tuxen No. 1 yes there is an hdmi screen connected to the dGPU.
Is also an ipmi with dedicated vga connected.But is an error that makes the primary screen not work completely.
The bios disabled internal ipma when an Ext GPU card is connected even though int gpu is selected as primary gpu in the bios. So I only see xcp-ng startup on screen no xsconsole. Have tried without a screen connected extgpu same error then
no. 2 Have tried pressing Detect only to be told that there is no more screen. Have only tried reboot
At first I thought there was something wrong with the bios. But this works with Vmware esxi and proxmox.
-
@steff22 I think I remembered that there were also some others who struggled with the same problem with Asrock rack B650D4U with gpu passthrough, not sure if it was with AM5.
-
@steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:
The bios disabled internal ipma when an Ext GPU card is connected even though int gpu is selected as primary gpu in the bios. So I only see xcp-ng startup on screen no xsconsole. Have tried without a screen connected extgpu same error then
I suggest to call the Asrock support and explain this behavior.
@steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:
no. 2 Have tried pressing Detect only to be told that there is no more screen. Have only tried reboot
Could you try the shutdown/start after the driver installation?
@steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:
At first I thought there was something wrong with the bios. But this works with Vmware esxi and proxmox.
Considering it worked with the same XCP-ng version, but on a different hardware, that's why I'm more inclined to a Xen incompatibility issue with the combo Nvidia + some AMD motherboards. If you search the forum, there's a mixed result about that.
-
@tuxen tried shutdown/start now same error.
Yes I have that opinion too when it comes to Nvidia the drivers are a bit picky.
Got it to work Asrock motherboard and an old Radeon HD 5450 so I think I'll sell the 1070ti card is starting to get old. Oh, I'd rather invest in a new Radeon GPU. Then I'll have to bet that this works on the Asrock motherboard.Does vGPU only work like that on enterprise GPUs like Nvidia Tesla?
-
Thank you for the time you spent on the troubleshooting process
-
what kind of magic have you put in the last 7 patches?
Now everything works no errors with letting windows install the drivers or adding drivers yourself Directly from nvidia
Double checked this by installing a clean Xcp-ng from iso without update. Clean win 10 install install drivers then error 43 appeared install patch and started win 10 and everything works. also works with win 11
-
@steff22 Wow, great news! Kudos to the Xen & XCP-ng dev teams
-
@tuxen yes Xen & XCP-ng dev teams have been really good
Started to get a little nervous that the big investment in hardware was wasted. Never would have needed such a powerful cpu without gpu passthrough.So thank you very much Xen & XCP-ng dev teams
-
@steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:
what kind of magic have you put in the last 7 patches?
Now everything works no errors with letting windows install the drivers or adding drivers yourself Directly from nvidia
Double checked this by installing a clean Xcp.ng from iso without update clean win 10 install install drivers then error 43 appeared install patch and started win 10 and everything works. also works with win 11
Thank you very much for your feedback. Frankly, IDK personally about the detailed content of the patch and why it fixed your issue, but I will pass the word to the team so we can guess/pinpoint the issue. Also keeping @Teddy-Astie in the loop