XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-ng 8.3 & AMD Firepro S7150x2

    Scheduled Pinned Locked Moved Hardware
    15 Posts 5 Posters 507 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • O Offline
      ohajek
      last edited by

      Hello,
      I am trying in vain to get an "AMD Firepro S7150x2" to run on XCP-ng 8.3 (latest Release)

      My Hardware configuration:

      Motherboard: ASUS TUF x670E-PLUS (Firmware: Version 3042 from 2024/10/22 )
      BIOS settings:

      • CSM -> disabled
      • SR-IOV -> enabled
      • IOMMU from auto -> enabled

      CPU: AMD Ryzen 9 7950X3D 16-Core Processor

      Steps I have already done:

      1. Installation of supplemental Pack "mxgpu-2.0.0.amd.iso"

      2. Exclude Modules for initramfs:
        echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
        echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf
        echo "blacklist amdkfd" >> /etc/modprobe.d/blacklist.conf
        --> rebuild initramfs

      3. Set Bootparameter in "grub2-efi.cfg":
        /opt/xensource/libexec/xen-cmdline --set-dom0 "pci=realloc pci=assign-busses"

      4. Load "gim" Modules
        -> kern.log output with errors (as file attached):
        AMD_FIREPRO_gim_kern.txt

      Does anyone have an idea where the problem could be?
      Would be grateful for any help.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates πŸͺ Co-Founder CEO
        last edited by

        Hi!

        Have you read the release notes before going in 8.3?

        https://docs.xcp-ng.org/releases/release-8-3/#amd-mxgpu-driver

        O 1 Reply Last reply Reply Quote 0
        • O Offline
          ohajek @olivierlambert
          last edited by

          @olivierlambert said in XCP-ng 8.3 & AMD Firepro S7150x2:

          Have you read the release notes before going in 8.3?

          Hello Olivier,
          unfortunately not - do you think it makes sense to try the XCP-ng 8.2 version?

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates πŸͺ Co-Founder CEO
            last edited by

            Yes, I would try that first.

            1 Reply Last reply Reply Quote 0
            • O Offline
              ohajek
              last edited by

              I have installed XCP-ng version 8.2 ... same problem, unfortunately no success.

              As next step I had compiled the GIM module and loaded it using "modprobe gim".
              Looks better now:
              kern_gim_compiled.txt

              ... and the devices are now also displayed:

              #> lspci |grep Tonga
              03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
              03:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:02.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              03:03.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
              05:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:02.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
              05:03.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]

              But assigning the vGPU to a VM ends with errors:

              message "HANDLE_INVALID(PCI, OpaqueRef:906ee333-c899-fbc4-ee85-ef7da8ead9a2)"
              name "XapiError"
              stack "XapiError: HANDLE_INVALID(PCI, OpaqueRef:906ee333-c899-fbc4-ee85-ef7da8ead9a2)\n at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)\n at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:38:21\n at runNextTicks (node:internal/process/task_queues:60:5)\n at processImmediate (node:internal/timers:447:9)\n at process.callbackTrampoline (node:internal/async_hooks:128:17)"

              XEN Source Logfile:
              xensource.txt

              I think the problem is with the initialization of the IOMMU interface:

              Nov 13 11:30:21 xen03 kernel: [10188.720655] AMD IOMMUv2 driver by Joerg Roedel jroedel@suse.de
              Nov 13 11:30:21 xen03 kernel: [10188.720656] AMD IOMMUv2 functionality not available on this system

              As a quick & dirty action, I installed ALMA Linux 8 (kernel 4.18) to see if it behaves differently.
              On ALMA the "amd_iommu_v2" module is loaded and initialized correctly:

              messages:Nov 12 12:04:24 localhost kernel: iommu: Default domain type: Passthrough
              messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
              messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:01.0: Adding to iommu group 0
              messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:01.2: Adding to iommu group 1
              ...
              ...
              messages:Nov 12 12:04:24 localhost kernel: pci 0000:11:00.6: Adding to iommu group 24
              messages:Nov 12 12:04:24 localhost kernel: pci 0000:12:00.0: Adding to iommu group 25
              messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
              messages:Nov 12 12:10:16 localhost kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
              messages:Nov 12 12:10:16 localhost kernel: AMD-Vi: AMD IOMMUv2 loaded and initialized

              Does anyone have an idea what the problem could be?
              I am very grateful for any advice!

              TeddyAstieT 1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates πŸͺ Co-Founder CEO
                last edited by

                Ping @Teddy-Astie

                1 Reply Last reply Reply Quote 0
                • TeddyAstieT Offline
                  TeddyAstie Vates πŸͺ XCP-ng Team Xen Guru @ohajek
                  last edited by

                  @ohajek

                  Nov 13 11:30:21 xen03 kernel: [10188.720655] AMD IOMMUv2 driver by Joerg Roedel jroedel@suse.de
                  Nov 13 11:30:21 xen03 kernel: [10188.720656] AMD IOMMUv2 functionality not available on this system
                  

                  This is expected, Dom0 Kernel (Linux) is not supposed to access the IOMMU when it is already used by Xen. To check if AMD-Vi is working, you need to check xl dmesg instead.

                  I took a quick look at kern_gim_compiled.txt, and it look likes it timed-out somewhere

                  Oct 23 20:49:32 xen03 kernel: [   80.657394]        gim error:(wait_cmd_complete:2387)  wait_cmd_complete -- time out after 0.003004460 sec
                  Oct 23 20:49:32 xen03 kernel: [   80.657408]        gim error:(wait_cmd_complete:2390)   Cmd = 0x17, Status = 0x0, cmd_Complete=0
                  

                  3ms looks like a short timeout for me, but aside that, it looks like a driver(gim) or hardware issue

                  1 Reply Last reply Reply Quote 1
                  • TeddyAstieT Offline
                    TeddyAstie Vates πŸͺ XCP-ng Team Xen Guru
                    last edited by TeddyAstie

                    @tuxen said (https://xcp-ng.org/forum/topic/3652/no-free-virtual-function-found-vgpu-s7150/4?_=1731502751059)

                    After some digging, could be the case of a GPU firmware being incompatible with UEFI. Do you have any spare server for testing XCP-ng boot in legacy/BIOS with this GPU?

                    Perhaps it is the issue ?

                    1 Reply Last reply Reply Quote 1
                    • O Offline
                      ohajek
                      last edited by

                      Hi Teddy,
                      thanks for the analysis and your brief explanation.
                      IOMMU should be properly activated:

                      (XEN) [ 0.221843] AMD-Vi: IOMMU Extended Features:
                      (XEN) [ 0.222606] - Peripheral Page Service Request
                      (XEN) [ 0.223366] - NX bit
                      (XEN) [ 0.224123] - Guest APIC Physical Processor Interrupt
                      (XEN) [ 0.224889] - Invalidate All Command
                      (XEN) [ 0.225649] - Guest APIC
                      (XEN) [ 0.226412] - Performance Counters
                      (XEN) [ 0.227178] - Host Address Translation Size: 0x2
                      (XEN) [ 0.227940] - Guest Address Translation Size: 0
                      (XEN) [ 0.228681] - Guest CR3 Root Table Level: 0x1
                      (XEN) [ 0.229416] - Maximum PASID: 0xf
                      (XEN) [ 0.230140] - SMI Filter Register: 0x1
                      (XEN) [ 0.230867] - SMI Filter Register Count: 0x1
                      (XEN) [ 0.231596] - Guest Virtual APIC Modes: 0x1
                      (XEN) [ 0.232316] - Dual PPR Log: 0x2
                      (XEN) [ 0.233024] - Dual Event Log: 0x2
                      (XEN) [ 0.233727] - Secure ATS
                      (XEN) [ 0.234424] - User / Supervisor Page Protection
                      (XEN) [ 0.235126] - Device Table Segmentation: 0x3
                      (XEN) [ 0.235826] - PPR Log Overflow Early Warning
                      (XEN) [ 0.236514] - PPR Automatic Response
                      (XEN) [ 0.237198] - Memory Access Routing and Control: 0x1
                      (XEN) [ 0.237881] - Block StopMark Message
                      (XEN) [ 0.238558] - Performance Optimization
                      (XEN) [ 0.239234] - MSI Capability MMIO Access
                      (XEN) [ 0.239906] - Guest I/O Protection
                      (XEN) [ 0.240570] - Enhanced PPR Handling
                      (XEN) [ 0.241231] - Invalidate IOTLB Type
                      (XEN) [ 0.241886] - VM Table Size: 0x2
                      (XEN) [ 0.242537] - Guest Access Bit Update Disable
                      (XEN) [ 0.252988] AMD-Vi: IOMMU 0 Enabled.
                      (XEN) [ 0.253820] I/O virtualisation enabled

                      I have also read about the problem with UEFI several times.
                      Now I am going to try to find a system with "legacy" boot mode.

                      R 1 Reply Last reply Reply Quote 0
                      • R Offline
                        rtjdamen @ohajek
                        last edited by

                        @ohajek did u ever managed to get this to work? We upgraded from 8.2 where it worked to 8.3, the card is detected but when u start the vm u get virtual function not available

                        O 1 Reply Last reply Reply Quote 0
                        • O Offline
                          ohajek @rtjdamen
                          last edited by

                          @rtjdamen No, unfortunately not! I tried everything possible and finally gave up ...
                          I wish you more luck getting it to work!

                          R 1 Reply Last reply Reply Quote 0
                          • R Offline
                            rtjdamen @ohajek
                            last edited by

                            @ohajek i am affraid then it’s a lost case with this oneπŸ˜‚, @olivierlambert do you have any knowledge of this going to work on xcp?

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates πŸͺ Co-Founder CEO
                              last edited by

                              I'm completely E_BUSY at the moment (at the Xen Winter Meetup), so I can't do much on long/complex problems on non-urgent situation πŸ˜“

                              1 Reply Last reply Reply Quote 0
                              • M Offline
                                mohammadm
                                last edited by

                                Too bad, so at this point we have 4/5 GPU's that are pretty much useless. Any other alternatives for GPU?

                                When using dGPU, PCI-e passthrough to the VM itself, it does work but after a while the whole screen turn black. They have to disconnect and connect to the RDP session again to see the screen again.

                                TeddyAstieT 1 Reply Last reply Reply Quote 0
                                • TeddyAstieT Offline
                                  TeddyAstie Vates πŸͺ XCP-ng Team Xen Guru @mohammadm
                                  last edited by

                                  @mohammadm said in XCP-ng 8.3 & AMD Firepro S7150x2:

                                  Too bad, so at this point we have 4/5 GPU's that are pretty much useless. Any other alternatives for GPU?

                                  When using dGPU, PCI-e passthrough to the VM itself, it does work but after a while the whole screen turn black. They have to disconnect and connect to the RDP session again to see the screen again.

                                  Looks like a screen suspend issue, have you tried to disable it ?

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post