Non-server CPU compatibility - Ryzen and Intel
-
You can probably pass the entire USB controller (but then you won't be able to get any USB port used outside the passed-through VM)
-
@olivierlambert - I've got dmesg info captured that I can pass along. My friend also decided to build a new Zen 4 box for XCP-ng. He went with (what I think is) a similar motherboard to what you have? (PRIME B650M-A AX):
https://www.asus.com/us/motherboards-components/motherboards/prime/prime-b650m-a-ax/The BIOS's are the same (ver 1222 - 2023/02/24), we loaded the same XCP-ng version (8.3alpha), the same xo-ce version (5.10.0-21-amd64), and the same Linux Mint version (21.1). Processors are similar Ryzen 9 7900 & 7900X. Both have Local APIC Mode set to X2APIC in the BIOS.
His box runs Linux Mint 21.1 without issues (as does yours). We captured dmesg output from all 3 (XCP-ng, xo-ce, Linux Mint 21.1 cinnamon) on both boxes and I've compared them side-by-side and each of the 3 have problems/errors on my machine. I summarized the differences/errors occurring on my box for 2 of the 3 dmesg captures below (XCP-ng and xo-ce).
I'm hoping that this might lead to some patches in XCP-ng software that will allow the Asus ROG STRIX B650E-F Gaming WIFI motherboard to fully work with XCP-ng. (XCP-ng works fine running Windows VMs, but not Linux VMs).
If a patch/fix from XCP-ng doesn't seem likely, then I'll probably replace the current motherboard with the PRIME version.
Below is a summary of what I've noticed in comparisons of XCP-ng dmesg files and xo-ce dmesg files.
In order of appearance, the differences I'm seeing in XCP-ng dmesg from my box are:
-
Hypervisor detected: Xen PV
tsc: Fast TSC calibration failedInstead of:
tsc: Fast TSC calibration using PIT -
no TSC line listed
Instead of:
tsc: Detected 3693.204 MHz TSC -
ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
ACPI BIOS Error (bug): Could not resolve [_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20180810/dswload2-160)
ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20180810/psobject-221)
ACPI Error: Ignore error and continue table load (20180810/psobject-604)
ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-543)
ACPI: 14 ACPI AML tables successfully acquired and loadedInstead of:
ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
ACPI: 13 ACPI AML tables successfully acquired and loaded -
This sequence happened:
[ 0.412184] usbcore: registered new device driver usb
[ 0.412184] WARNING: CPU: 0 PID: 1 at drivers/i2c/busses/i2c-designware-common.c:245 i2c_dw_clk_rate+0x16/0x30
[ 0.412184] Modules linked in:
[ 0.412184] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+1 #1
[ 0.412184] Hardware name: ASUS System Product Name/ROG STRIX B650E-F GAMING WIFI, BIOS 0821 11/15/2022
[ 0.412184] RIP: e030:i2c_dw_clk_rate+0x16/0x30
[ 0.412184] Code: 00 48 c7 c6 5e ee e7 81 31 c0 5d e9 d4 60 f6 ff 0f 1f 40 00 0f 1f 44 00 00 48 8b 47 48 48 85 c0 74 08 e8 1d 35 43 00 89 c0 c3 <0f> 0b 0f 1f 84 00 00 00 00 00 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00
[ 0.412184] RSP: e02b:ffffc9004006fcf8 EFLAGS: 00010246
[ 0.412184] RAX: 0000000000000000 RBX: ffff8881384d2018 RCX: 00000000aeffff00
INFO DELETED
[ 0.412184] ---[ end trace eae5bc73295d4325 ]---
[ 0.412941] pps_core: LinuxPPS API ver. 1 registered
[ 0.412941] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it
[ 0.412942] PTP clock support registeredInstead of:
[ 0.476402] usbcore: registered new device driver usb
[ 0.476402] pps_core: LinuxPPS API ver. 1 registered
[ 0.476402] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it
[ 0.476402] PTP clock support registered
In xo-ce dmesg on my box, I saw:
rcu_sched self-detected stall on CPU (2 occurrences)Would it be useful for me to upload the 3 dmesg capture files from both boxes (i.e. XCP-ng, xo-ce, Linux Mint 21.1)?
-
-
Well, a pretty buggy BIOS on your side doesn't help I suppose
-
I built a system with a Ryzen 7950x on an ASRock B650M PG Riptide motherboard and was having similar issues as mgales. I switch to an ASUS Prime B650M-A II-CSM without any improvement.
With the ASUS Prime, there were no BIOS errors reported.
(I was able to get rid of 'ACPI BIOS Error (bug): Could not resolve [_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20180810/dswload2-160)' by enabling the onboard audio.)I am able to run imported Windows VMs (Windows 10 and Server 2022) without any apparent issues.
I can run an imported AlmaLinux 8 VM with the nopv kernel option.
I can run the AlmaLinux 8 installer with the nopv option.
I can run Xen Orchestra with the nopv option.
I can also run an imported CentOS 6 VM without any additional options.The main issue seems to be a stuck CPU on the Linux VMs when using PV drivers.
Could there be issues specific to Rzyen 7900x and 7950x?
-
so it seems I need to purchase a 7900. I wonder if a non-X will do it
-
@olivierlambert - my friend with the Ryzen 7900 isn't experiencing the same issues that @BlueBadger and myself (with the 7900X) are having - his system is working fine (both of us are running xcp-ng-8.3.testing-2023.02.15-12.19-install.iso).
-
With the same motherboard and the same BIOS settings/version?
-
His is closer to your board (I believe), and to one that @BlueBadger tried (ASUS Prime B650M-A II-CSM)
My friend's board is: ASUS Prime B650M-A AX:
https://www.asus.com/us/motherboards-components/motherboards/prime/prime-b650m-a-ax/He's using the Ryzen 9 7900 and isn't experiencing any problems with Linux VMs in XCP-ng.
-
An idea investigation would be to swap the X and non-X CPU and see if there's a diff.
I'm under the impression it's more a motherboard issue (BIOS, or version) than anything else however
-
I talked to my friend about doing a processor swap test, but he's happy with the way his system is running and doesn't want to take a chance of messing something up. Sorry about that
-
Maybe there's others people in the community that could bring that info
-
I ordered a Ryzen 7900 last night and got it this morning. (Thanks Amazon).
I just replaced the 7950x with the 7900 and things seem to work better.I can run now a AlmaLinux 8 VM without the nopv flag.
I can now run Xen Orchestra without the nopv flag.I will do some more testing.
-
@BlueBadger - Thank you! Looking forward to see what else you learn.
-
I was having issues (download stalls) with the onboard 2.5Gb NIC (RTL8125) on the ASUS Prime B650M-A II-CSM motherboard even after I switched from the Ryzen 7950x to the 7900.
My setup also includes a X540 10Gb NIC which seemed to be working well.I swapped the motherboard back to the ASRock B650M PG Riptide and was still having issues with the onboard 2.5Gb NIC.
I disabled the onboard NIC and installed a second X540 and have not have any network issues so far.
I'm guessing there might be an issue with the r8125 driver.
Excluding the onboard 2.5Gb NIC, XCP-ng seems to run well on both motherboards.
The BIOS errors in dmesg don't seem to be causing any issues.
(The ASRock B650M PG Riptide seems like a nicer motherboard.) -
That'sā¦ interesting So the "X" series seems to have some issues in the end? It's weird since it should be very different than it's non-X counterpart.
-
So, we've had reports on xen-devel which look a little like this.
@BlueBadger are you able to switch back to your 7950x and try booting Xen with
x2apic_phys=true
? It appears that the -X processors are missing a feature in their IOMMU and Xen was getting confused when setting up interrupt handling.https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=0d2686f6b66b4b1b3c72c3525083b0ce02830054 is at least part of the fix, but so far feedback on the mailing lists suggests it's not a complete fix.
-
@andyhhp Thanks for the info.
I plan to leave my current machine (Ryzen 7900) as is since it seems to be running well.
I plan to build a new machine with the extra 7950x. The motherboard is on back order.
I will try the new setting once it is built. -
@andyhhp I built a new machine with my Ryzen 7950x.
Booting Xen with x2apic_phys=true did not seem to fix any issues.
-
Interesting, thanks for the feedback. @andyhhp should we provide a Xen version with the initial fix and see if it's better? (maybe combined to the x2apic param)
-
I'm testing this combo:
- AMD RYZEN 9 7900X
- ASUS PRIME X670-P WIFI bios 1406
- 2x32GB KINGSTON 5600 CL40 (max QLV)
- Boot drive NVME 250gb (chipset) and SN850X 4.0 1TB on CPU.
With setting: Local APIC Mode = X2APIC and UEFI set to Other OS. Installed 8.3 alpha and updated, got errors. Test installing XOA took too much time and booting was painfully slow using only 1 SSD on chipset NVME.
Tried disabling IOMMU, but the same issue.