XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Firepro S7150x2 SR-IOV Errors

    Scheduled Pinned Locked Moved Compute
    27 Posts 6 Posters 4.5k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      tbluml
      last edited by

      This is the driver you are talking about, correct?
      25c8af3e-2c77-449e-9326-07196f9b01cc-image.png

      Unfortunately, I installed the new update, and I have been greeted with the same error.
      43ae5f45-c7b5-44a9-b230-7c9b3c7c6e7d-image.png
      What could the bottom prompt ("The selected virtual GPU type does not support multiple instances.") mean? Could that be a clue about how the driver/XCPng is not behaving how it should?
      a8139806-bc74-420b-97b9-5a3dbe1d18a6-image.png
      The driver was installed the same way that was advised in the wiki article. Am I doing something wrong? Could it be an issue with the driver/hardware?

      I greatly appreciate your assistance, olivierlambert! Thank you again!

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Online
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        I have no idea. r1 does it ring any bell?

        1 Reply Last reply Reply Quote 0
        • R Offline
          r1 XCP-ng Team
          last edited by

          tbluml can you share # lspci -k and # dmesg? Also, is SR-IOV enabled in BIOS and does it have any count specified there?

          1 Reply Last reply Reply Quote 1
          • T Offline
            tbluml
            last edited by

            SR-IOV is on, I wasn't able to find any count. What would it be under?
            bab74e13-4cf6-41a7-85fa-eab0e60450d3-image.png

            lspci -k:

            
            00:00.0 Host bridge: Intel Corporation Xeon E5/Core i7 DMI2 (rev 07)
                    Subsystem: Dell Device 048c
            00:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 1a (rev 07)
                    Kernel driver in use: pcieport
            00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07)
                    Kernel driver in use: pcieport
            00:02.2 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2c (rev 07)
                    Kernel driver in use: pcieport
            00:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3a in PCI Express Mode (rev 07)
                    Kernel driver in use: pcieport
            00:03.2 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3c (rev 07)
                    Kernel driver in use: pcieport
            00:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 0 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 1 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 2 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 3 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 4 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 5 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 6 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 7 (rev 07)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ioatdma
            00:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07)
                    Subsystem: Dell Device 048c
            00:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control Status and Global Errors (rev 07)
                    Subsystem: Dell Device 048c
            00:11.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 05)
                    Kernel driver in use: pcieport
            00:16.0 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #1 (rev 05)
                    Subsystem: Dell Device 048c
                    Kernel modules: mei_me
            00:16.1 Communication controller: Intel Corporation C600/X79 series chipset MEI Controller #2 (rev 05)
                    Subsystem: Dell Device 048c
            00:1a.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 (rev 05)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ehci-pci
                    Kernel modules: ehci_pci
            00:1c.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 (rev b5)
                    Kernel driver in use: pcieport
            00:1c.7 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Root Port 8 (rev b5)
                    Kernel driver in use: pcieport
            00:1d.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 (rev 05)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ehci-pci
                    Kernel modules: ehci_pci
            00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
            00:1f.0 ISA bridge: Intel Corporation C600/X79 series chipset LPC Controller (rev 05)
                    Subsystem: Dell Device 048c
                    Kernel modules: lpc_ich
            00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 05)
                    Subsystem: Dell Device 048c
                    Kernel driver in use: ahci
                    Kernel modules: ahci
            01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
                    Subsystem: Dell Gigabit 4P I350-t rNDC
                    Kernel driver in use: igb
                    Kernel modules: igb
            01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
                    Subsystem: Dell Gigabit 4P I350-t rNDC
                    Kernel driver in use: igb
                    Kernel modules: igb
            01:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
                    Subsystem: Dell Gigabit 4P I350-t rNDC
                    Kernel driver in use: igb
                    Kernel modules: igb
            01:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
                    Subsystem: Dell Gigabit 4P I350-t rNDC
                    Kernel driver in use: igb
                    Kernel modules: igb
            02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)
                    Subsystem: Dell PERC H710P Mini (for monolithics)
                    Kernel driver in use: megaraid_sas
                    Kernel modules: megaraid_sas
            04:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
                    Kernel driver in use: pcieport
            05:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
                    Kernel driver in use: pcieport
            05:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
                    Kernel driver in use: pcieport
            06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
                    Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0334
            07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
                    Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0334
            0b:00.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe Switch [PS]
                    Kernel driver in use: pcieport
            0c:00.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe Switch [PS]
                    Kernel driver in use: pcieport
            0c:01.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe Switch [PS]
                    Kernel driver in use: pcieport
            0d:00.0 PCI bridge: Renesas Technology Corp. SH7757 PCIe-PCI Bridge [PPB]
            0e:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. G200eR2
                    Subsystem: Dell Device 048c
            3f:08.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:09.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0a.0 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0a.1 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0a.2 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0a.3 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0b.0 System peripheral: Intel Corporation Xeon E5/Core i7 Interrupt Control Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0b.3 System peripheral: Intel Corporation Xeon E5/Core i7 Semaphore and Scratchpad Configuration Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0c.0 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0c.1 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0c.2 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0c.3 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0c.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller System Address Decoder 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0c.7 System peripheral: Intel Corporation Xeon E5/Core i7 System Address Decoder (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0d.0 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0d.1 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0d.2 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0d.3 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0d.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller System Address Decoder 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
                    Subsystem: Dell Device 048c
            3f:0e.1 Performance counters: Intel Corporation Xeon E5/Core i7 Processor Home Agent Performance Monitoring (rev 07)
                    Subsystem: Dell Device 048c
            3f:0f.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0f.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller RAS Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0f.2 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0f.3 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0f.4 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0f.5 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:0f.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 4 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.2 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.3 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.4 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.5 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:10.7 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:11.0 System peripheral: Intel Corporation Xeon E5/Core i7 DDRIO (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:13.0 System peripheral: Intel Corporation Xeon E5/Core i7 R2PCIe (rev 07)
                    Subsystem: Intel Corporation Device 0000
            3f:13.1 Performance counters: Intel Corporation Xeon E5/Core i7 Ring to PCI Express Performance Monitor (rev 07)
                    Subsystem: Dell Device 048c
            3f:13.4 Performance counters: Intel Corporation Xeon E5/Core i7 QuickPath Interconnect Agent Ring Registers (rev 07)
                    Subsystem: Dell Device 048c
            3f:13.5 Performance counters: Intel Corporation Xeon E5/Core i7 Ring to QuickPath Interconnect Link 0 Performance Monitor (rev 07)
                    Subsystem: Dell Device 048c
            3f:13.6 System peripheral: Intel Corporation Xeon E5/Core i7 Ring to QuickPath Interconnect Link 1 Performance Monitor (rev 07)
                    Subsystem: Dell Device 048c
            40:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 1a (rev 07)
                    Kernel driver in use: pcieport
            40:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07)
                    Kernel driver in use: pcieport
            40:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3a in PCI Express Mode (rev 07)
                    Kernel driver in use: pcieport
            40:03.2 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3c (rev 07)
                    Kernel driver in use: pcieport
            40:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 4 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 5 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 6 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 7 (rev 07)
                    Subsystem: Intel Corporation Device 0000
                    Kernel driver in use: ioatdma
            40:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07)
                    Subsystem: Dell Device 048c
            40:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control Status and Global Errors (rev 07)
                    Subsystem: Dell Device 048c
            43:00.0 Ethernet controller: Chelsio Communications Inc T320 10GbE Dual Port Adapter
                    Subsystem: Chelsio Communications Inc Device 0001
                    Kernel driver in use: cxgb3
                    Kernel modules: cxgb3
            44:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
                    Subsystem: Intel Corporation 10GbE 2P X520 Adapter
                    Kernel driver in use: ixgbe
                    Kernel modules: ixgbe
            44:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
                    Subsystem: Intel Corporation 10GbE 2P X520 Adapter
                    Kernel driver in use: ixgbe
                    Kernel modules: ixgbe
            7f:08.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:09.0 System peripheral: Intel Corporation Xeon E5/Core i7 QPI Link 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0a.0 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0a.1 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0a.2 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0a.3 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0b.0 System peripheral: Intel Corporation Xeon E5/Core i7 Interrupt Control Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0b.3 System peripheral: Intel Corporation Xeon E5/Core i7 Semaphore and Scratchpad Configuration Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0c.0 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0c.1 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0c.2 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0c.3 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0c.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller System Address Decoder 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0c.7 System peripheral: Intel Corporation Xeon E5/Core i7 System Address Decoder (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0d.0 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0d.1 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0d.2 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0d.3 System peripheral: Intel Corporation Xeon E5/Core i7 Unicast Register 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0d.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller System Address Decoder 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
                    Subsystem: Dell Device 048c
            7f:0e.1 Performance counters: Intel Corporation Xeon E5/Core i7 Processor Home Agent Performance Monitoring (rev 07)
                    Subsystem: Dell Device 048c
            7f:0f.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0f.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller RAS Registers (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0f.2 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0f.3 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0f.4 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0f.5 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:0f.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Target Address Decoder 4 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.2 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 0 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.3 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 1 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.4 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.5 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.6 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 2 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:10.7 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller ERROR Registers 3 (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:11.0 System peripheral: Intel Corporation Xeon E5/Core i7 DDRIO (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:13.0 System peripheral: Intel Corporation Xeon E5/Core i7 R2PCIe (rev 07)
                    Subsystem: Intel Corporation Device 0000
            7f:13.1 Performance counters: Intel Corporation Xeon E5/Core i7 Ring to PCI Express Performance Monitor (rev 07)
                    Subsystem: Dell Device 048c
            7f:13.4 Performance counters: Intel Corporation Xeon E5/Core i7 QuickPath Interconnect Agent Ring Registers (rev 07)
                    Subsystem: Dell Device 048c
            7f:13.5 Performance counters: Intel Corporation Xeon E5/Core i7 Ring to QuickPath Interconnect Link 0 Performance Monitor (rev 07)
                    Subsystem: Dell Device 048c
            7f:13.6 System peripheral: Intel Corporation Xeon E5/Core i7 Ring to QuickPath Interconnect Link 1 Performance Monitor (rev 07)
                    Subsystem: Dell Device 048c
            
            
            1 Reply Last reply Reply Quote 0
            • T Offline
              tbluml
              last edited by

              [ 1123.874023] AMD Virt GIM API
              [ 1123.877477] gim: module license 'Proprietary' taints kernel.
              [ 1123.877478] Disabling lock debugging due to kernel taint
              [ 1123.880580]        gim info:(gim_init:197) *******AMD GIM init
              [ 1123.880582]        gim info:(print_gim_version:62) GPU IOV MODULE (GIM) - version 2.00.0000
              [ 1123.880582]        gim info:(gim_init:200) Copyright (c) 2014-2016 AMD Corporation.
              [ 1123.880822]        gim info:(parse_config_file:295) AMD GIM fb_option = 0
              [ 1123.880823]        gim info:(parse_config_file:295) AMD GIM sched_option = 0
              [ 1123.880824]        gim info:(parse_config_file:295) AMD GIM vf_num = 0
              [ 1123.880825]        gim info:(parse_config_file:295) AMD GIM pf_fb = 0
              [ 1123.880826]        gim info:(parse_config_file:295) AMD GIM vf_fb = 0
              [ 1123.880827]        gim info:(parse_config_file:295) AMD GIM sched_interval = 7
              [ 1123.880828]        gim info:(parse_config_file:295) AMD GIM fb_clear = 1
              [ 1123.880829]        gim info:(parse_config_file:295) AMD GIM hang_detect_timeout = 100
              [ 1123.880831]        gim info:(parse_config_file:295) AMD GIM max_quanta = 1000
              [ 1123.880832]        gim info:(parse_config_file:295) AMD GIM self_switch = 500
              [ 1123.880833]        gim info:(parse_config_file:295) AMD GIM exclusive = 1600
              [ 1123.880834]        gim info:(parse_config_file:295) AMD GIM fair_scheduling = 0
              [ 1123.880835]        gim info:(parse_config_file:295) AMD GIM debug_level = 3
              [ 1123.880837]        gim info:(parse_config_file:295) AMD GIM clear_fb_on_flr = 0
              [ 1123.880838]        gim info:(parse_config_file:295) AMD GIM clear_fb_on_free_vf = 1
              [ 1123.880839]        gim info:(init_config:445) INIT CONFIG
              [ 1123.890809]        gim error:(gim_probe:123) gim_probe(06:00.0)
              [ 1123.890822]        gim info:(alloc_adapter:454) allocate adapter for PF 0x0600
              [ 1123.890823]        gim info:(alloc_adapter:457) Found free adapter at index 0
              [ 1123.890829] PF0    gim info:(SetNewAdapter:1096) curr allocated at 00000000d08994a5
              [ 1123.890830] PF0    gim info:(SetNewAdapter:1102) Can't disable ATS --> Not enabled in the first place
              [ 1123.890831] PF0    gim info:(SetNewAdapter:1113) SRIOV is supported
              [ 1123.890831] PF0    gim info:(SetNewAdapter:1121) found PCI bridge device
              [ 1123.890832] PF0    gim info:(SetNewAdapter:1124) found: 05:8.0
              [ 1123.890979] PF0    gim info:(SetNewAdapter:1147) mmio_base = 000000007da02c16
              [ 1123.891597] PF0    gim info:(SetNewAdapter:1149) doorbell = 00000000c74e7207
              [ 1123.981443] PF0    gim info:(SetNewAdapter:1151) pf.fb_va = 000000009a0d3e40
              [ 1123.981469]        gim info:(sriov_is_ari_enabled:180) PCI_SRIOV_CAP = 0x00000002
              [ 1123.981470]        gim info:(sriov_is_ari_enabled:190) PCI_SRIOV_CTRL = 0x00000010
              [ 1123.981471]        gim info:(sriov_is_ari_enabled:194) PCI_SRIOV_CTRL_ARI is set --> ARI is supported
              [ 1123.981474] PF0    gim info:(program_ari_mode:957) Read bif_strap8 = 0x00200004
              [ 1123.981474] PF0    gim info:(program_ari_mode:963) program_ari_mode - Set ARI_Mode = PF_BUS
              [ 1123.981475] PF0    gim info:(program_ari_mode:978) Write bif_strap8 = 0x00000004
              [ 1123.981476] PF0    gim info:(gim_read_rom_from_reg:634) Reading VBios from ROM
              [ 1123.981594] PF0    gim info:(gim_read_VBIOS:695) VBIOS starts:  0x55, 0xaa
              [ 1123.981595] PF0    gim info:(gim_read_VBIOS:698) VBios size is 0x10000
              [ 1123.981601] PF0    gim info:(gim_read_VBIOS:708) pVBIOS allocated at 00000000e96a70db for size of 0x80000
              [ 1123.981602] PF0    gim info:(gim_read_rom_from_reg:634) Reading VBios from ROM
              [ 1125.088134] PF0    gim info:(gim_read_VBIOS:718) BIOS Version Major 0xF Minor 0x31
              [ 1125.088210] PF0    gim info:(gim_read_VBIOS:729) VBios Checksum = 0x541c00
              [ 1125.088211] PF0    gim info:(gim_read_VBIOS:738) Valid video BIOS image, size = 0x10000, check sum is 0x541c00
              [ 1125.088212] PF0    gim info:(gim_read_VBIOS:739) Read in full Vbios image of size = 0x80000
              [ 1125.088266] PF0    gim info:(gim_post_VBIOS:776) Init Parser passed!, continue
              [ 1125.088269] <1>ATOM_CheckAsicStatus - BIOS_SCRATCH_7 = 0x00000000
              [ 1125.088269] <1>  Isolate ATOM_S7_ASIC_INIT_COMPLETE_MASK bit(s) = 0x00000000
              [ 1125.088271] <1>  RLC_CNTL = 0x00000000
              [ 1125.088271] <1>  Isolate RLC_CNTL__RLC_ENABLE_F32_MASK = 0x00000000
              [ 1125.088272] <1>ATOM_ASIC_NEED_POST
              [ 1125.088273] PF0    gim info:(gim_post_VBIOS:795) Asic needs a VBios post
              [ 1125.088274]        gim info:(ATOM_PostVBIOS:215) ATOM_PostVBIOS: FirmwareInfo passed
              [ 1125.088275]        gim info:(ATOM_PostVBIOS:261) ATOM_PostVBIOS: ASIC_Init before, engine clock = 7530, memory clock =1e848
              [ 1125.412963]        gim info:(ATOM_PostVBIOS:263) ATOM_PostVBIOS: ASIC_Init after
              [ 1125.412964]        gim info:(ATOM_PostVBIOS:273) ATOM_PostVBIOS: ATOM_InitFanCntl before
              [ 1125.412965]        gim info:(ATOM_PostVBIOS:275) ATOM_PostVBIOS: ATOM_InitFanCntl after
              [ 1125.412965] PF0    gim info:(gim_post_VBIOS:801) Post INIT_ASIC successfully!
              [ 1125.412977]        gim warning:(firmware_requires_update:473) SMU option ROM version 0x111700 versus patch version 0x111a00
              [ 1125.412989]        gim warning:(firmware_requires_update:486) RLCV option ROM version 113. Patch version 1
              [ 1125.412989]        gim info:(firmware_requires_update:495) TOC found, update it
              [ 1125.412990]        gim info:(patch_firmware:549) Update SMC_Init table
              [ 1125.414871]        gim warning:(patch_firmware:574) Update smu firmware
              [ 1125.416014]        gim warning:(patch_firmware:582) Update RLCV firmware
              [ 1125.416083]        gim warning:(patch_firmware:590) Update TOC
              [ 1125.416520]        gim info:(func_recalc_checksum:518) func_recalc_checksum original= 56
              [ 1125.416550]        gim info:(func_recalc_checksum:522) func_recalc_checksum new= 89
              [ 1125.416551] PF0    gim info:(gim_post_VBIOS:811) Asic needs firmware loaded
              [ 1125.416551]        gim info:(ATOM_PostVBIOS:215) ATOM_PostVBIOS: FirmwareInfo passed
              [ 1125.416552]        gim info:(ATOM_PostVBIOS:250) just load uCode
              [ 1125.416553]        gim info:(ATOM_PostVBIOS:261) ATOM_PostVBIOS: ASIC_Init before, engine clock = 7530, memory clock =1e848
              [ 1127.081802]        gim info:(ATOM_PostVBIOS:263) ATOM_PostVBIOS: ASIC_Init after
              [ 1127.081803]        gim info:(ATOM_PostVBIOS:273) ATOM_PostVBIOS: ATOM_InitFanCntl before
              [ 1127.081803]        gim info:(ATOM_PostVBIOS:275) ATOM_PostVBIOS: ATOM_InitFanCntl after
              [ 1127.081805] PF0    gim info:(gim_post_VBIOS:817) Post LOAD_FW successfully!
              [ 1127.081805] PF0    gim info:(gim_post_VBIOS:818) Post VBIOS successfully!
              [ 1127.082653]        gim info:(enable_thermal_control:643) Thermal Control Enable
              [ 1127.082655] PF0    gim info:(SetNewAdapter:1207) gim_post_VBIOS done
              [ 1127.082656] PF0    gim info:(SetNewAdapter:1248) Scheduler Time interval set to 7 msec
              [ 1127.082659]        gim info:(EnableSriov:398) Enable SRIOV
              [ 1127.082659]        gim info:(EnableSriov:399) Enable SRIOV vfs count = 16
              [ 1127.082689] gim 0000:06:00.0: not enough MMIO resources for SR-IOV
              [ 1127.082702]        gim error:(EnableSriov:410) Fail to enable sriov, status = fffffff4
              [ 1127.082711]        gim error:(SetNewAdapter:1263) Failed to properly enable SRIOV(map_image) !!!!
              [ 1127.186274]        gim error:(gim_probe:126) Failed to create new adapter
              [ 1127.186297] gim: probe of 0000:06:00.0 failed with error -1
              [ 1127.186312]        gim error:(gim_probe:123) gim_probe(07:00.0)
              [ 1127.186319]        gim info:(alloc_adapter:454) allocate adapter for PF 0x0700
              [ 1127.186320]        gim info:(alloc_adapter:457) Found free adapter at index 0
              [ 1127.186325] PF0    gim info:(SetNewAdapter:1096) curr allocated at 00000000d08994a5
              [ 1127.186326] PF0    gim info:(SetNewAdapter:1102) Can't disable ATS --> Not enabled in the first place
              [ 1127.186327] PF0    gim info:(SetNewAdapter:1113) SRIOV is supported
              [ 1127.186327] PF0    gim info:(SetNewAdapter:1121) found PCI bridge device
              [ 1127.186328] PF0    gim info:(SetNewAdapter:1124) found: 05:10.0
              [ 1127.186544] PF0    gim info:(SetNewAdapter:1147) mmio_base = 00000000de310274
              [ 1127.187275] PF0    gim info:(SetNewAdapter:1149) doorbell = 00000000c74e7207
              [ 1127.266496] PF0    gim info:(SetNewAdapter:1151) pf.fb_va = 000000009a0d3e40
              [ 1127.266520]        gim info:(sriov_is_ari_enabled:180) PCI_SRIOV_CAP = 0x00000002
              [ 1127.266521]        gim info:(sriov_is_ari_enabled:190) PCI_SRIOV_CTRL = 0x00000010
              [ 1127.266522]        gim info:(sriov_is_ari_enabled:194) PCI_SRIOV_CTRL_ARI is set --> ARI is supported
              [ 1127.266524] PF0    gim info:(program_ari_mode:957) Read bif_strap8 = 0x00200004
              [ 1127.266525] PF0    gim info:(program_ari_mode:963) program_ari_mode - Set ARI_Mode = PF_BUS
              [ 1127.266526] PF0    gim info:(program_ari_mode:978) Write bif_strap8 = 0x00000004
              [ 1127.266526] PF0    gim info:(gim_read_rom_from_reg:634) Reading VBios from ROM
              [ 1127.266655] PF0    gim info:(gim_read_VBIOS:695) VBIOS starts:  0x55, 0xaa
              [ 1127.266658] PF0    gim info:(gim_read_VBIOS:698) VBios size is 0x10000
              [ 1127.266729] PF0    gim info:(gim_read_VBIOS:708) pVBIOS allocated at 00000000e96a70db for size of 0x80000
              [ 1127.266730] PF0    gim info:(gim_read_rom_from_reg:634) Reading VBios from ROM
              [ 1128.371441] PF0    gim info:(gim_read_VBIOS:718) BIOS Version Major 0xF Minor 0x31
              [ 1128.371519] PF0    gim info:(gim_read_VBIOS:729) VBios Checksum = 0x541c00
              [ 1128.371520] PF0    gim info:(gim_read_VBIOS:738) Valid video BIOS image, size = 0x10000, check sum is 0x541c00
              [ 1128.371520] PF0    gim info:(gim_read_VBIOS:739) Read in full Vbios image of size = 0x80000
              [ 1128.371575] PF0    gim info:(gim_post_VBIOS:776) Init Parser passed!, continue
              [ 1128.371579] <1>ATOM_CheckAsicStatus - BIOS_SCRATCH_7 = 0x00000000
              [ 1128.371579] <1>  Isolate ATOM_S7_ASIC_INIT_COMPLETE_MASK bit(s) = 0x00000000
              [ 1128.371581] <1>  RLC_CNTL = 0x00000000
              [ 1128.371581] <1>  Isolate RLC_CNTL__RLC_ENABLE_F32_MASK = 0x00000000
              [ 1128.371582] <1>ATOM_ASIC_NEED_POST
              [ 1128.371582] PF0    gim info:(gim_post_VBIOS:795) Asic needs a VBios post
              [ 1128.371583]        gim info:(ATOM_PostVBIOS:215) ATOM_PostVBIOS: FirmwareInfo passed
              [ 1128.371584]        gim info:(ATOM_PostVBIOS:261) ATOM_PostVBIOS: ASIC_Init before, engine clock = 7530, memory clock =1e848
              [ 1128.696397]        gim info:(ATOM_PostVBIOS:263) ATOM_PostVBIOS: ASIC_Init after
              [ 1128.696398]        gim info:(ATOM_PostVBIOS:273) ATOM_PostVBIOS: ATOM_InitFanCntl before
              [ 1128.696399]        gim info:(ATOM_PostVBIOS:275) ATOM_PostVBIOS: ATOM_InitFanCntl after
              [ 1128.696400] PF0    gim info:(gim_post_VBIOS:801) Post INIT_ASIC successfully!
              [ 1128.696412]        gim warning:(firmware_requires_update:473) SMU option ROM version 0x111700 versus patch version 0x111a00
              [ 1128.696424]        gim warning:(firmware_requires_update:486) RLCV option ROM version 113. Patch version 1
              [ 1128.696424]        gim info:(firmware_requires_update:495) TOC found, update it
              [ 1128.696425]        gim info:(patch_firmware:549) Update SMC_Init table
              [ 1128.698220]        gim warning:(patch_firmware:574) Update smu firmware
              [ 1128.699381]        gim warning:(patch_firmware:582) Update RLCV firmware
              [ 1128.699450]        gim warning:(patch_firmware:590) Update TOC
              [ 1128.699882]        gim info:(func_recalc_checksum:518) func_recalc_checksum original= 56
              [ 1128.699911]        gim info:(func_recalc_checksum:522) func_recalc_checksum new= 89
              [ 1128.699912] PF0    gim info:(gim_post_VBIOS:811) Asic needs firmware loaded
              [ 1128.699912]        gim info:(ATOM_PostVBIOS:215) ATOM_PostVBIOS: FirmwareInfo passed
              [ 1128.699913]        gim info:(ATOM_PostVBIOS:250) just load uCode
              [ 1128.699914]        gim info:(ATOM_PostVBIOS:261) ATOM_PostVBIOS: ASIC_Init before, engine clock = 7530, memory clock =1e848
              [ 1130.370214]        gim info:(ATOM_PostVBIOS:263) ATOM_PostVBIOS: ASIC_Init after
              [ 1130.370215]        gim info:(ATOM_PostVBIOS:273) ATOM_PostVBIOS: ATOM_InitFanCntl before
              [ 1130.370216]        gim info:(ATOM_PostVBIOS:275) ATOM_PostVBIOS: ATOM_InitFanCntl after
              [ 1130.370217] PF0    gim info:(gim_post_VBIOS:817) Post LOAD_FW successfully!
              [ 1130.370217] PF0    gim info:(gim_post_VBIOS:818) Post VBIOS successfully!
              [ 1130.370976]        gim info:(enable_thermal_control:643) Thermal Control Enable
              [ 1130.370977] PF0    gim info:(SetNewAdapter:1207) gim_post_VBIOS done
              [ 1130.370978] PF0    gim info:(SetNewAdapter:1248) Scheduler Time interval set to 7 msec
              [ 1130.370980]        gim info:(EnableSriov:398) Enable SRIOV
              [ 1130.370981]        gim info:(EnableSriov:399) Enable SRIOV vfs count = 16
              [ 1130.370987] gim 0000:07:00.0: not enough MMIO resources for SR-IOV
              [ 1130.370997]        gim error:(EnableSriov:410) Fail to enable sriov, status = fffffff4
              [ 1130.371005]        gim error:(SetNewAdapter:1263) Failed to properly enable SRIOV(map_image) !!!!
              [ 1130.474928]        gim error:(gim_probe:126) Failed to create new adapter
              [ 1130.474958] gim: probe of 0000:07:00.0 failed with error -1
              [ 1130.475210]        gim info:(gim_ioctl_init:567) IOCTL device created and ready for use
              [ 1130.475211] Running Kaveri version of GIM
              
              
              1 Reply Last reply Reply Quote 0
              • T Offline
                tbluml
                last edited by

                I just posted the results of dmesg, as the entire result was quite long. Let me know if you need anything else. I left in everything related to GIM. Thank you both again for your help!

                1 Reply Last reply Reply Quote 0
                • T Offline
                  tbluml
                  last edited by

                  Could a BIOS update potentially fix this issue?
                  https://access.redhat.com/solutions/37376

                  1 Reply Last reply Reply Quote 0
                  • T Offline
                    tbluml
                    last edited by

                    Update: BIOS updated to most recent (2.9). Still having the same issue.

                    T 1 Reply Last reply Reply Quote 0
                    • T Offline
                      tuxen Top contributor @tbluml
                      last edited by tuxen

                      tbluml did you try the pci=realloc workaround, as stated in the RHEL link?

                      # /opt/xensource/libexec/xen-cmdline --set-dom0 pci=realloc
                      

                      Edit: reboot the host after applying the change.

                      T 1 Reply Last reply Reply Quote 0
                      • T Offline
                        tbluml @tuxen
                        last edited by

                        tuxen Just tried it (from the terminal), and rebooted with the same result unfortunately. Does the command need to be appended to a file, or should it work just from the terminal?

                        83d2c07f-180f-42a1-aa12-e44d72e6d8a0-image.png

                        T 1 Reply Last reply Reply Quote 0
                        • T Offline
                          tuxen Top contributor @tbluml
                          last edited by

                          It's from the terminal/CLI. Alternatively, you can verify/change the boot options in /boot/grub/grub.cfg (for dom0 boot, see module2 /boot/vmlinuz entries).

                          Found this Citrix KB adding one more pci option, take a look:
                          https://support.citrix.com/article/CTX250121

                          1 Reply Last reply Reply Quote 0
                          • T Offline
                            tbluml
                            last edited by

                            For the moment, I took the S7150x2 out of the R720 and put it in a Supermicro X10DRH-CT-O with E5-2620v3's for testing. After everything was set up, (BIOS, OS, and driver), I found that MxGPU did work. (Good to know that if all else fails, I have a machine that will work for what I need!)

                            I will take a look at that, tuxen! Thank you!

                            1 Reply Last reply Reply Quote 0
                            • R Offline
                              r1 XCP-ng Team
                              last edited by

                              tbluml Do you want to give a try to open source gim driver on your Dell machine? We may know more from it.

                              1 Reply Last reply Reply Quote 0
                              • T Offline
                                tbluml
                                last edited by

                                I had the chance to try the rest of the commands linked by tuxen today, and now I can successfully run a VM with MxGPU enabled and started! It looks like adding "pci=assign-busses" to this command did it.

                                /opt/xensource/libexec/xen-cmdline --set-dom0 "pci=realloc pci=assign-busses" 
                                

                                Thank you all for you assistance!

                                P 1 Reply Last reply Reply Quote 2
                                • olivierlambertO Online
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  That's interesting! Maybe you can add this to the documentation?

                                  T 1 Reply Last reply Reply Quote 0
                                  • T Offline
                                    tbluml @olivierlambert
                                    last edited by

                                    olivierlambert I would be happy to. Is there a post or link to posting guidelines? (So I can make sure that what I write is in line with what has already been written?)

                                    1 Reply Last reply Reply Quote 0
                                    • olivierlambertO Online
                                      olivierlambert Vates 🪐 Co-Founder CEO
                                      last edited by olivierlambert

                                      Here: https://xcp-ng.org/docs/compute.html#mxgpu-amd-vgpu

                                      There's a link on the bottom off the page (called "Help us to improve this page!") to contribute to it and add what you did 🙂

                                      1 Reply Last reply Reply Quote 0
                                      • P Offline
                                        pigeon @tbluml
                                        last edited by pigeon

                                        tbluml I'm trying to make a MxGPU setup with similar hardware (dell r720, 2x E5-2650).
                                        I got the same SR-IOV errors as you. I added the pci=realloc pci=assign-busses params.
                                        Unfortunately the the system does not manage to boot when adding pci=assign-busses.
                                        Root disk in not discovered and dracut shell is started.
                                        Did you run into the same issue and if so how did you fix it?

                                        Edit:
                                        If anyone else stumbles upon this.
                                        I reinstalled on (usb) disk that is not connected to the raid controller and it seems to work now.
                                        I speculate that since the controller is a PCI device and pci=assign-busses allows the kernel to override pci numbers the raid device cannot be found using the predetermined data in the initramfs. But that might be complete nonsense (no expert in these matters).

                                        1 Reply Last reply Reply Quote 0
                                        • M Offline
                                          m6
                                          last edited by m6

                                          I have the same problem with xcp-ng-8.2. I'm trying to start with mxgpu with HPE ML380p Gen8 E5-2620V2. Inserting pci=realloc pci=assign-busses the server cannot boot. Below the point of boot where it crashes.
                                          The log in images seems to recall a known bug --> "choose an explicit smt=(bool) setting. See XSA-297"
                                          It's the pci=assign-busses that cannot permit to boot but without it "modprobe gim" has not inserted. Also using usb disk avoiding PCI disk system crashs during startup. Firmware bios is really recent ( 2019 ) , the last one. Someone has resolved this issue ?crash-assign-busses.jpg

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post