XCP-ng 8.3 & AMD Firepro S7150x2
-
Hello,
I am trying in vain to get an "AMD Firepro S7150x2" to run on XCP-ng 8.3 (latest Release)My Hardware configuration:
Motherboard: ASUS TUF x670E-PLUS (Firmware: Version 3042 from 2024/10/22 )
BIOS settings:- CSM -> disabled
- SR-IOV -> enabled
- IOMMU from auto -> enabled
CPU: AMD Ryzen 9 7950X3D 16-Core Processor
Steps I have already done:
-
Installation of supplemental Pack "mxgpu-2.0.0.amd.iso"
-
Exclude Modules for initramfs:
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf
echo "blacklist amdkfd" >> /etc/modprobe.d/blacklist.conf
--> rebuild initramfs -
Set Bootparameter in "grub2-efi.cfg":
/opt/xensource/libexec/xen-cmdline --set-dom0 "pci=realloc pci=assign-busses" -
Load "gim" Modules
-> kern.log output with errors (as file attached):
AMD_FIREPRO_gim_kern.txt
Does anyone have an idea where the problem could be?
Would be grateful for any help. -
Hi!
Have you read the release notes before going in 8.3?
https://docs.xcp-ng.org/releases/release-8-3/#amd-mxgpu-driver
-
@olivierlambert said in XCP-ng 8.3 & AMD Firepro S7150x2:
Have you read the release notes before going in 8.3?
Hello Olivier,
unfortunately not - do you think it makes sense to try the XCP-ng 8.2 version? -
Yes, I would try that first.
-
I have installed XCP-ng version 8.2 ... same problem, unfortunately no success.
As next step I had compiled the GIM module and loaded it using "modprobe gim".
Looks better now:
kern_gim_compiled.txt... and the devices are now also displayed:
#> lspci |grep Tonga
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
03:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:02.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
03:03.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT GL [FirePro S7150]
05:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:02.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.1 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.2 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.3 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.4 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.5 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.6 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]
05:03.7 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XTV GL [FirePro S7150V]But assigning the vGPU to a VM ends with errors:
message "HANDLE_INVALID(PCI, OpaqueRef:906ee333-c899-fbc4-ee85-ef7da8ead9a2)"
name "XapiError"
stack "XapiError: HANDLE_INVALID(PCI, OpaqueRef:906ee333-c899-fbc4-ee85-ef7da8ead9a2)\n at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)\n at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:38:21\n at runNextTicks (node:internal/process/task_queues:60:5)\n at processImmediate (node:internal/timers:447:9)\n at process.callbackTrampoline (node:internal/async_hooks:128:17)"XEN Source Logfile:
xensource.txtI think the problem is with the initialization of the IOMMU interface:
Nov 13 11:30:21 xen03 kernel: [10188.720655] AMD IOMMUv2 driver by Joerg Roedel jroedel@suse.de
Nov 13 11:30:21 xen03 kernel: [10188.720656] AMD IOMMUv2 functionality not available on this systemAs a quick & dirty action, I installed ALMA Linux 8 (kernel 4.18) to see if it behaves differently.
On ALMA the "amd_iommu_v2" module is loaded and initialized correctly:messages:Nov 12 12:04:24 localhost kernel: iommu: Default domain type: Passthrough
messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:01.0: Adding to iommu group 0
messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:01.2: Adding to iommu group 1
...
...
messages:Nov 12 12:04:24 localhost kernel: pci 0000:11:00.6: Adding to iommu group 24
messages:Nov 12 12:04:24 localhost kernel: pci 0000:12:00.0: Adding to iommu group 25
messages:Nov 12 12:04:24 localhost kernel: pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
messages:Nov 12 12:10:16 localhost kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
messages:Nov 12 12:10:16 localhost kernel: AMD-Vi: AMD IOMMUv2 loaded and initializedDoes anyone have an idea what the problem could be?
I am very grateful for any advice! -
Ping @Teddy-Astie
-
Nov 13 11:30:21 xen03 kernel: [10188.720655] AMD IOMMUv2 driver by Joerg Roedel jroedel@suse.de Nov 13 11:30:21 xen03 kernel: [10188.720656] AMD IOMMUv2 functionality not available on this system
This is expected, Dom0 Kernel (Linux) is not supposed to access the IOMMU when it is already used by Xen. To check if AMD-Vi is working, you need to check
xl dmesg
instead.I took a quick look at kern_gim_compiled.txt, and it look likes it timed-out somewhere
Oct 23 20:49:32 xen03 kernel: [ 80.657394] gim error:(wait_cmd_complete:2387) wait_cmd_complete -- time out after 0.003004460 sec Oct 23 20:49:32 xen03 kernel: [ 80.657408] gim error:(wait_cmd_complete:2390) Cmd = 0x17, Status = 0x0, cmd_Complete=0
3ms looks like a short timeout for me, but aside that, it looks like a driver(gim) or hardware issue
-
@tuxen said (https://xcp-ng.org/forum/topic/3652/no-free-virtual-function-found-vgpu-s7150/4?_=1731502751059)
After some digging, could be the case of a GPU firmware being incompatible with UEFI. Do you have any spare server for testing XCP-ng boot in legacy/BIOS with this GPU?
Perhaps it is the issue ?
-
Hi Teddy,
thanks for the analysis and your brief explanation.
IOMMU should be properly activated:(XEN) [ 0.221843] AMD-Vi: IOMMU Extended Features:
(XEN) [ 0.222606] - Peripheral Page Service Request
(XEN) [ 0.223366] - NX bit
(XEN) [ 0.224123] - Guest APIC Physical Processor Interrupt
(XEN) [ 0.224889] - Invalidate All Command
(XEN) [ 0.225649] - Guest APIC
(XEN) [ 0.226412] - Performance Counters
(XEN) [ 0.227178] - Host Address Translation Size: 0x2
(XEN) [ 0.227940] - Guest Address Translation Size: 0
(XEN) [ 0.228681] - Guest CR3 Root Table Level: 0x1
(XEN) [ 0.229416] - Maximum PASID: 0xf
(XEN) [ 0.230140] - SMI Filter Register: 0x1
(XEN) [ 0.230867] - SMI Filter Register Count: 0x1
(XEN) [ 0.231596] - Guest Virtual APIC Modes: 0x1
(XEN) [ 0.232316] - Dual PPR Log: 0x2
(XEN) [ 0.233024] - Dual Event Log: 0x2
(XEN) [ 0.233727] - Secure ATS
(XEN) [ 0.234424] - User / Supervisor Page Protection
(XEN) [ 0.235126] - Device Table Segmentation: 0x3
(XEN) [ 0.235826] - PPR Log Overflow Early Warning
(XEN) [ 0.236514] - PPR Automatic Response
(XEN) [ 0.237198] - Memory Access Routing and Control: 0x1
(XEN) [ 0.237881] - Block StopMark Message
(XEN) [ 0.238558] - Performance Optimization
(XEN) [ 0.239234] - MSI Capability MMIO Access
(XEN) [ 0.239906] - Guest I/O Protection
(XEN) [ 0.240570] - Enhanced PPR Handling
(XEN) [ 0.241231] - Invalidate IOTLB Type
(XEN) [ 0.241886] - VM Table Size: 0x2
(XEN) [ 0.242537] - Guest Access Bit Update Disable
(XEN) [ 0.252988] AMD-Vi: IOMMU 0 Enabled.
(XEN) [ 0.253820] I/O virtualisation enabledI have also read about the problem with UEFI several times.
Now I am going to try to find a system with "legacy" boot mode.