XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work

    Scheduled Pinned Locked Moved Hardware
    99 Posts 7 Posters 16.1k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      tuxen Top contributor
      last edited by

      @steff22 After reading this Blue Iris topic, I wonder if it's related. As of Xen 4.15, there was a change on MSRs handling that would cause a guest crash if it tries to access those registers. XCP-ng 8.3 has the Xen 4.17 version. The issue seems to be CPU-vendor-model dependent too.

      https://xcp-ng.org/forum/topic/8873/windows-blue-iris-xcp-ng-8-3

      It's worth to test the solution provided there (VM shutdown/start cycle is required to take effect):

      xe vm-param-add uuid=<VM-UUID> param-name=platform msr-relaxed=true
      

      Replace the <VM-UUID> with your VM W10 uuid.

      Tux

      S 2 Replies Last reply Reply Quote 0
      • S Offline
        steff22 @tuxen
        last edited by

        @tuxen that didn't work either.

        probably not so interesting but tried Xcp-ng 8.2.1 also same error there

        1 Reply Last reply Reply Quote 0
        • S Offline
          steff22 @tuxen
          last edited by steff22

          @tuxen Got hold of custom bios with G4 available now. But disabling 4g didn't help.. It was hidden but always enabled

          But see that in Proxmox the pci slot does not change. it stays at slot 1 with Xcp-ng it changes to slot 8.

          Proxmox user uses hex pci id for gpu passthrough does this matter?

          TeddyAstieT 1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates πŸͺ Co-Founder CEO
            last edited by

            It shouldn't. Any idea here @Teddy-Astie ?

            1 Reply Last reply Reply Quote 0
            • TeddyAstieT Offline
              TeddyAstie Vates πŸͺ XCP-ng Team Xen Guru @steff22
              last edited by TeddyAstie

              @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

              @tuxen Got hold of custom bios with G4 available now. But disabling 4g didn't help.. It was hidden but always enabled

              But see that in Proxmox the pci slot does not change. it stays at slot 1 with Xcp-ng it changes to slot 8.

              Is this slot change in the guest or in XCP-ng itself ?
              I would not be supprised to see it chaging in the guest (it's a part of what QEMU pci passthrough can do) but that would be weird for Dom0.

              Proxmox user uses hex pci id for gpu passthrough does this matter?

              I don't think it's a guarantee that you see the same BDF in guest and host regarding in proxmox (more likely a coincidence). But that's a behavior that would change in XCP-ng with Q35/proper PCIe support.

              Aside that, it "appears" to work in a Linux VM (albeit with no display), have you managed to make CUDA running there (to see if the device actually works) ? And if the BAR window message still exist.

              S 2 Replies Last reply Reply Quote 0
              • S Offline
                steff22 @TeddyAstie
                last edited by

                @Teddy-Astie no only in guest.

                Haven't tried anything more with pop os than trying to extend the screen.
                can try to look at it with CUDA running

                1 Reply Last reply Reply Quote 0
                • S Offline
                  steff22 @TeddyAstie
                  last edited by

                  @Teddy-Astie I have installed nvidia drivers and CUDA Toolkit. It seems like the drivers are running properly and I see that the pop os gpu card in popos but ran dmesg again and got an error

                  ation="profile_load" profile="unconfined" name="libreoffice-xpdfimport" pid=516 comm="apparmor_parser"
                  [    5.425026] Adding 4193784k swap on /dev/mapper/cryptswap.  Priority:-2 extents:1 across:4193784k SS
                  [    5.427939] nvidia: module license 'NVIDIA' taints kernel.
                  [    5.427941] Disabling lock debugging due to kernel taint
                  [    5.427943] nvidia: module license taints kernel.
                  [    5.473953] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
                  
                  [    5.475202] xen: --> pirq=24 -> irq=17 (gsi=17)
                  [    5.475401] nvidia 0000:00:08.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
                  [    5.476296] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  560.35.03  Fri Aug 16 21:39:15 UTC 2024
                  [    5.502521] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  560.35.03  Fri Aug 16 21:21:48 UTC 2024
                  [    5.505900] [drm] [nvidia-drm] [GPU ID 0x00000008] Loading driver
                  [    5.829042] zram: Added device: zram0
                  [    6.103268] kauditd_printk_skb: 18 callbacks suppressed
                  [    6.103273] audit: type=1400 audit(1731496933.838:29): apparmor="DENIED" operation="capable" class="cap" profile="/usr/sbin/cupsd" pid=671 comm="cupsd" capability=12  capname="net_admin"
                  [    6.288396] snd_hda_intel 0000:00:09.0: azx_get_response timeout, switching to polling mode: last cmd=0x000f0000
                  [    6.385067] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:08.0 on minor 1
                  [    6.422276] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
                  [    6.452377] nvidia-uvm: Loaded the UVM driver, major device number 236.
                  [    6.844962] zram0: detected capacity change from 0 to 32735232
                  [    7.215079] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:09.0/sound/card0/input6
                  [    7.216764] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:09.0/sound/card0/input7
                  [    7.217068] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:09.0/sound/card0/input8
                  [    7.217363] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:09.0/sound/card0/input9
                  [    7.332815] Adding 16367612k swap on /dev/zram0.  Priority:1000 extents:1 across:16367612k SS
                  [   10.684565] rfkill: input handler disabled
                  [   20.569595] rfkill: input handler enabled
                  [   23.259676] rfkill: input handler disabled
                  [   23.572353] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000008] Failed to grab modeset ownership
                  steff@pop-os:~$ lspci | grep VGA
                  00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
                  00:08.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
                  
                  S 1 Reply Last reply Reply Quote 0
                  • S Offline
                    steff22 @steff22
                    last edited by

                    @steff22 This was a known bug that had no significance.

                    ((The warning message is expected. When a client (such as the modesetting driver) attempts to open our DRM device node while modesetting permission is already acquired by something else (like the NVIDIA X driver), it has to fail, but the kernel won’t let us return a failure after v5.9-rc1, so we print this message. It won’t impact functionality of the NVIDIA X driver that already has modesetting permission. Safe to ignore as long as long as you didn’t need the other client to actually get modesetting permission. If you want to suppress the error, you would need to find which client is attempting to open the NVIDIA DRM device node and prevent it from doing so.))

                    1 Reply Last reply Reply Quote 0
                    • T Offline
                      tuxen Top contributor
                      last edited by

                      @steff22 I have some questions:

                      1. Is the host being powered up with a monitor or a dummy plug (headless) already attached to the dGPU?
                      2. Without rebooting the VM and right after the driver installation succeeds (showing that the device is working OK), what happens if you click the [Detect] button at the display settings window?
                      3. Instead of a reboot, did you try a VM shutdown/start cycle for the 1st time after the driver installation?

                      Nonetheless, if the same dGPU card works normally on another XCP-ng host, a possible Xen passthrough incompatibility with that AM5 board should be considered. For example:

                      • CSM/UEFI GPU compat issues, as referenced by @Teddy-Astie
                      • Beta BIOS/IOMMU broken or lacking features (eg. ACS support for PCI/PCIe isolation)

                      Tux

                      S 1 Reply Last reply Reply Quote 0
                      • S Offline
                        steff22 @tuxen
                        last edited by

                        @tuxen No. 1 yes there is an hdmi screen connected to the dGPU.
                        Is also an ipmi with dedicated vga connected.

                        But is an error that makes the primary screen not work completely.

                        The bios disabled internal ipma when an Ext GPU card is connected even though int gpu is selected as primary gpu in the bios. So I only see xcp-ng startup on screen no xsconsole. Have tried without a screen connected extgpu same error then

                        no. 2 Have tried pressing Detect only to be told that there is no more screen. Have only tried reboot

                        At first I thought there was something wrong with the bios. But this works with Vmware esxi and proxmox.

                        S T 2 Replies Last reply Reply Quote 0
                        • S Offline
                          steff22 @steff22
                          last edited by

                          @steff22 I think I remembered that there were also some others who struggled with the same problem with Asrock rack B650D4U with gpu passthrough, not sure if it was with AM5.

                          1 Reply Last reply Reply Quote 0
                          • T Offline
                            tuxen Top contributor @steff22
                            last edited by

                            @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

                            The bios disabled internal ipma when an Ext GPU card is connected even though int gpu is selected as primary gpu in the bios. So I only see xcp-ng startup on screen no xsconsole. Have tried without a screen connected extgpu same error then

                            I suggest to call the Asrock support and explain this behavior.

                            @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

                            no. 2 Have tried pressing Detect only to be told that there is no more screen. Have only tried reboot

                            Could you try the shutdown/start after the driver installation?

                            @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

                            At first I thought there was something wrong with the bios. But this works with Vmware esxi and proxmox.

                            Considering it worked with the same XCP-ng version, but on a different hardware, that's why I'm more inclined to a Xen incompatibility issue with the combo Nvidia + some AMD motherboards. If you search the forum, there's a mixed result about that.

                            S 2 Replies Last reply Reply Quote 0
                            • S Offline
                              steff22 @tuxen
                              last edited by

                              @tuxen tried shutdown/start now same error.

                              Yes I have that opinion too when it comes to Nvidia the drivers are a bit picky.
                              Got it to work Asrock motherboard and an old Radeon HD 5450 so I think I'll sell the 1070ti card is starting to get old. Oh, I'd rather invest in a new Radeon GPU. Then I'll have to bet that this works on the Asrock motherboard.

                              Does vGPU only work like that on enterprise GPUs like Nvidia Tesla?

                              S 1 Reply Last reply Reply Quote 0
                              • S Offline
                                steff22 @steff22
                                last edited by

                                Thank you for the time you spent on the troubleshooting process πŸ™‚

                                1 Reply Last reply Reply Quote 0
                                • S Offline
                                  steff22 @olivierlambert
                                  last edited by steff22

                                  @olivierlambert

                                  what kind of magic have you put in the last 7 patches?

                                  Now everything works no errors with letting windows install the drivers or adding drivers yourself Directly from nvidia πŸ™‚ πŸ™‚

                                  Double checked this by installing a clean Xcp-ng from iso without update. Clean win 10 install install drivers then error 43 appeared install patch and started win 10 and everything works. also works with win 11

                                  T olivierlambertO A 3 Replies Last reply Reply Quote 1
                                  • T Offline
                                    tuxen Top contributor @steff22
                                    last edited by

                                    @steff22 Wow, great news! Kudos to the Xen & XCP-ng dev teams πŸ‘

                                    S 1 Reply Last reply Reply Quote 1
                                    • S Offline
                                      steff22 @tuxen
                                      last edited by

                                      @tuxen yes Xen & XCP-ng dev teams have been really good πŸ‘ πŸ˜€
                                      Started to get a little nervous that the big investment in hardware was wasted. Never would have needed such a powerful cpu without gpu passthrough.

                                      So thank you very much Xen & XCP-ng dev teams πŸ‘ πŸ˜€

                                      1 Reply Last reply Reply Quote 1
                                      • olivierlambertO Offline
                                        olivierlambert Vates πŸͺ Co-Founder CEO @steff22
                                        last edited by

                                        @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

                                        @olivierlambert

                                        what kind of magic have you put in the last 7 patches?

                                        Now everything works no errors with letting windows install the drivers or adding drivers yourself Directly from nvidia πŸ™‚ πŸ™‚

                                        Double checked this by installing a clean Xcp.ng from iso without update clean win 10 install install drivers then error 43 appeared install patch and started win 10 and everything works. also works with win 11

                                        Thank you very much for your feedback. Frankly, IDK personally about the detailed content of the patch and why it fixed your issue, but I will pass the word to the team so we can guess/pinpoint the issue. Also keeping @Teddy-Astie in the loop

                                        1 Reply Last reply Reply Quote 0
                                        • A Offline
                                          andyhhp Xen Guru @steff22
                                          last edited by

                                          @steff22 said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

                                          what kind of magic have you put in the last 7 patches?

                                          You've got a very recent AMD processor, so it's probably this fix https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=86001b3970fea4536048607ea6e12541736c48e1 from upstream.

                                          S 1 Reply Last reply Reply Quote 1
                                          • S Offline
                                            steff22 @andyhhp
                                            last edited by

                                            @andyhhp Was something like I was hoping for. But thank you very much for the fix πŸ‘ πŸ˜€ πŸ˜€

                                            But still doesn't work on pop os 22.04. Does the guest tools version matter? It seems to be using V6.6.80-0

                                            S 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post