XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work

    Scheduled Pinned Locked Moved Hardware
    99 Posts 7 Posters 16.0k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • TeddyAstieT Offline
      TeddyAstie Vates 🪐 XCP-ng Team Xen Guru
      last edited by

      Hello steff22 ,

      Tested with another PC with Xcp-ng 8.3 and everything worked immediately. So think it must be Asrock rack B650D4U3-2L2Q bios related or ipmi trying to use the same gpu.

      What are the specs of the hardware where it works ?

      S 1 Reply Last reply Reply Quote 0
      • S Offline
        steff22 @TeddyAstie
        last edited by

        @Teddy-Astie It's an asus prime h370f with an old 8 core intel core i7 9700k

        1 Reply Last reply Reply Quote 0
        • R Offline
          ravenet
          last edited by

          I have gpu pcie passthrough working on Asrock rack wrx80d8-2t (Threadripper), and x470d4u2 (Ryzen) to passthrough multiple Radeon Pro 7000 series gpus

          Rodney

          S 1 Reply Last reply Reply Quote 0
          • S Offline
            steff22 @ravenet
            last edited by

            ravenet Do you remember if you only enabled svm and Iommu? Or did you take the whole bunch with SR-iov, ACS,AER,ARI support, resize bar Support

            R 1 Reply Last reply Reply Quote 0
            • R Offline
              ravenet @steff22
              last edited by

              steff22
              Far as I know, SVM, IOMMU, SR-IOV and POSSIBLY resize bar were enabled.
              Can't check the threadripper system until the weekend as it runs production loads.

              I'll reboot one of the Ryzen boxes tonight to verify what I enabled there, but was basically defaults other than making sure the 3 or 4 items above were enabled.

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                steff22 @ravenet
                last edited by

                ravenet ok thanks

                Didn't work here.
                But there may be some error in the bios since it is beta and the first bios that supports the Amd Ryzen9000 series.

                But the strange thing is that it works stably with Proxmox without any errors of any kind.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Online
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  It's not strange, those a two VERY different system in how they work.

                  S 1 Reply Last reply Reply Quote 0
                  • S Offline
                    steff22 @olivierlambert
                    last edited by

                    olivierlambert Is there a slight software difference, yes But both are hypervisors.

                    So the fact that one of them works means that it should be possible to make it happen with my hardware.

                    Isn't that much you do with proxmox rather than enable iommu in grub and blacklist nvidia drivers plus disable_vga out on the Gpu card.

                    Is it possible to blacklist the nvidia drivers with Xcp-ng?
                    And there is no difference to enable pci passthrough in xen-orchestra compared to command-line?

                    I think I have tried both parts but not sure if I had enabled everything in the bios then

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Online
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      No, it's not a slight difference: it's a completely different design. In XCP-ng, you start first to boot on Xen, a kind of microkernel. Then only you boot on specific VM, the Dom0, which is a PV guest. Then, from this guest, you have the API etc. But if the Dom0 has access to the hardware for I/O (NICs, disks, GPUs…), it's still a VM and Dom0 doesn't have access to all CPUs and memory of the physical machine.

                      This means, Xen will always be "in the middle" to control who takes what (on cores and RAM). This is a great design from a security perspective because you always have a small piece of software (Xen) that is controlling what's going on.

                      In KVM, it's vastly different. You boot on a full Linux kernel, and then you load a small module (KVM). The "host" is really the host, accessing all the memory and CPUs. There's a lot less isolation (and therefore security), however the plus is that there's nobody in the middle to deal with.

                      I can assure you that PCI passthrough is really complex and also depends on many factors, even how the BIOS is configured and how many things are done by the motherboard manufacturer.

                      So knowing it works with Proxmox is as saying "it works on Windows", it's that different. It doesn't mean it's not possible in Xen, but if it does, it will be vastly different on how it's done.

                      S 1 Reply Last reply Reply Quote 1
                      • S Offline
                        steff22 @olivierlambert
                        last edited by

                        olivierlambert Ok, didn't realize there was such a big difference. I actually like everything better with Xcp-ng and xen orchestra.

                        All testing has been done with UEFI BIOS. Is it just a waste of time to try with legacy bios on Xcp-ng and vm os?

                        Does it still make sense to try to blacklist the nvidia drivers with Xcp-ng Dom0 to try to isolate the Gpu even more.

                        But is Xcp-ng more like Vmware esxi in the way everything is handled? doesn't work with vmware esxi either.

                        T 1 Reply Last reply Reply Quote 0
                        • olivierlambertO Online
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          Yes, you could say XCP-ng is a lot closer to VMware than KVM 🙂 (even if VMware is more advanced on some aspects, the architecture is roughly similar).

                          Anyway, it doesn't mean that will never work; However, it might be less straightforward than with KVM until we find exactly what's causing the problem.

                          S 1 Reply Last reply Reply Quote 0
                          • T Offline
                            tuxen Top contributor @steff22
                            last edited by

                            steff22 Assuming the xen-pciback.hide was previously set, could you try this workaround (no guarantee that'll work, since each motherboard and BIOSes have their quirks):

                            /opt/xensource/libexec/xen-cmdline --set-dom0 pci=realloc
                            reboot
                            
                            S 1 Reply Last reply Reply Quote 0
                            • S Offline
                              steff22 @tuxen
                              last edited by

                              tuxen Didn't work unfortunately. Have used xen orchestra for pciback.hide. the last time. But tried the old fashioned way first. with the command line.

                              1 Reply Last reply Reply Quote 0
                              • S Offline
                                steff22 @olivierlambert
                                last edited by

                                olivierlambert

                                I got Gpu passthrough working with Vmware esxi 8.0.3 now.

                                Was a choice that one could choose to connect a dynamic PCI device.

                                Isn't this an indication that the error lies in how Xcp-ng handles pci passthrough?

                                Both Vmware esxi 8.0.3 and proxmox have two different ways to connect pci devices to vm only one of them works the other method creates error 43 in windows.

                                But as I said, I don't know anything about what the different choices do

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Online
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  Do you have more information in that VMware option?

                                  S 2 Replies Last reply Reply Quote 0
                                  • S Offline
                                    steff22 @olivierlambert
                                    last edited by

                                    olivierlambert No, unfortunately, nothing but this pictureScreenshot from 2024-11-07 22-21-11.png Screenshot from 2024-11-07 22-22-47.png

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Online
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by

                                      @Teddy-Astie do you have any idea what it could be and why it makes the device working?

                                      1 Reply Last reply Reply Quote 0
                                      • S Offline
                                        steff22 @olivierlambert
                                        last edited by

                                        olivierlambert Found this article which explains a bit about dynamic PCI device. From what I understood, dynamic PCI devices are newer and more accurate and use DRS

                                        https://frankdenneman.nl/2023/06/06/vsphere-ml-accelerator-spectrum-deep-dive-using-dynamic-directpath-io-passthrough-with-vms/

                                        Screenshot from 2024-11-08 06-31-43.png Screenshot from 2024-11-08 06-31-52.png

                                        1 Reply Last reply Reply Quote 0
                                        • T Offline
                                          tuxen Top contributor
                                          last edited by tuxen

                                          steff22 what's the output of lspci -k and xl pci-assignable-list ?

                                          Also, the outputs of the system logs re. GPU and IOMMU initialization would be very useful:

                                          egrep -i '(nvidia|vga|video|pciback)' /var/log/kern.log
                                          xl dmesg
                                          

                                          Tux

                                          S 1 Reply Last reply Reply Quote 0
                                          • S Offline
                                            steff22 @tuxen
                                            last edited by Danp

                                            tuxen said in Gpu passthrough on Asrock rack B650D4U3-2L2Q will not work:

                                            lspci -k and xl pci-assignable-list

                                            xcp-ng-Asrock ~]# lspci -k
                                            00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14d8
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 14d8
                                            00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 14d9
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 14d9
                                            00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14da
                                            00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14db
                                            	Kernel driver in use: pcieport
                                            00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14db
                                            	Kernel driver in use: pcieport
                                            00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14da
                                            00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14db
                                            	Kernel driver in use: pcieport
                                            00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14db
                                            	Kernel driver in use: pcieport
                                            00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14da
                                            00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14da
                                            00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14da
                                            00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14dd
                                            	Kernel driver in use: pcieport
                                            00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14dd
                                            	Kernel driver in use: pcieport
                                            00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
                                            	Kernel driver in use: piix4_smbus
                                            	Kernel modules: i2c_piix4
                                            00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
                                            00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e0
                                            00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e1
                                            00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e2
                                            00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e3
                                            00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e4
                                            00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e5
                                            00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e6
                                            00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 14e7
                                            01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1)
                                            	Subsystem: ZOTAC International (MCO) Ltd. Device 2445
                                            	Kernel driver in use: pciback
                                            01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
                                            	Subsystem: ZOTAC International (MCO) Ltd. Device 2445
                                            	Kernel driver in use: pciback
                                            02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57502 NetXtreme-E 10Gb/25Gb/40Gb/50Gb Ethernet (rev 12)
                                            	Subsystem: ASRock Incorporation Device 1752
                                            	Kernel driver in use: bnxt_en
                                            	Kernel modules: bnxt_en
                                            02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57502 NetXtreme-E 10Gb/25Gb/40Gb/50Gb Ethernet (rev 12)
                                            	Subsystem: ASRock Incorporation Device 1752
                                            	Kernel driver in use: bnxt_en
                                            	Kernel modules: bnxt_en
                                            03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Upstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:0c.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            04:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
                                            	Kernel driver in use: pcieport
                                            05:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
                                            	Subsystem: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller
                                            	Kernel driver in use: xhci_hcd
                                            	Kernel modules: xhci_pci
                                            06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
                                            	Subsystem: ASRock Incorporation Device 1533
                                            	Kernel driver in use: igb
                                            	Kernel modules: igb
                                            07:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
                                            	Subsystem: ASRock Incorporation Device 1533
                                            	Kernel driver in use: igb
                                            	Kernel modules: igb
                                            08:00.0 PCI bridge: ASRock Incorporation Device 1150 (rev 06)
                                            09:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
                                            	Subsystem: ASRock Incorporation Device 2000
                                            0c:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset USB 3.2 Controller (rev 01)
                                            	Subsystem: ASMedia Technology Inc. Device 1142
                                            	Kernel driver in use: xhci_hcd
                                            	Kernel modules: xhci_pci
                                            0d:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller (rev 01)
                                            	Subsystem: ASMedia Technology Inc. Device 1062
                                            	Kernel driver in use: ahci
                                            	Kernel modules: ahci
                                            0e:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. FURY Renegade NVMe SSD with heatsink (rev 01)
                                            	Subsystem: Kingston Technology Company, Inc. FURY Renegade NVMe SSD with heatsink
                                            	Kernel driver in use: nvme
                                            	Kernel modules: nvme
                                            0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 13c0 (rev c1)
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 13c0
                                            0f:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
                                            0f:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP
                                            0f:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b6
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 15b6
                                            	Kernel driver in use: xhci_hcd
                                            	Kernel modules: xhci_pci
                                            0f:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b7
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 15b6
                                            	Kernel driver in use: xhci_hcd
                                            	Kernel modules: xhci_pci
                                            0f:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 62)
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor
                                            0f:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Device d601
                                            10:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b8
                                            	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 15b6
                                            	Kernel driver in use: xhci_hcd
                                            	Kernel modules: xhci_pci
                                            
                                            [17:29 xcp-ng-Asrock ~]# xl pci-assignable-list
                                            0000:01:00.0 
                                            0000:01:00.1 
                                            [17:29 xcp-ng-Asrock ~]#
                                            
                                            S 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post