Non-server CPU compatibility - Ryzen and Intel
-
Warm migration is even a great solution in those cases (Intel to AMD or vice versa)
-
Just installed XCP-ng 8.3 on a Zen 4 platform:
- Asus Prime B650M-Wifi
- Ryzen 5 7600 (Zen 4)
- 32GiB DDR5
Just had to enable x2API in the BIOS, everything worked out of the box (including the 2.5G Realtek NIC).
-
With the Ryzen 9 7900X, an Asus B650E series motherboard, and 32 GiB DDR5, the results I've gotten aren't favorable. I was able to install xcp-ng-8.3.testing-2023.02.15-12.19-install.iso without enabling x2API in the BIOS, and XCP-ng comes up fine. While xo-ce (running on the same box) comes up right away, it takes about 8-10 minutes before it acquires it's IP address (both XCP-ng and xo-ce use DHCP to get their IPs from a DHCP/DNS).
After xo-ce is up and accessible, Windows 10 works well (install from ISO, boot & run performance is good). OSs from various Linux distros are horrendously slow. It took about 2 hrs to install Linux Mint 21.1 from an ISO. It takes several minutes, with any of the Linux ISOs I've tried (CentOS Stream 9.1, Ubuntu 22.04.1, Linux Mint 21.1) for a 2nd screen to appear after the initial install screen (which comes up fine).
The install of Mint took about 2 hrs to complete.
VM configuration: 8-CPUs, 8 GiB RAM, 22 GiB hard drive.Linux Mint OS comes up, but with similar slow boot time (29 minutes before LM logo, CPU usage at 100% on Stats tab during that 29 minutes, but oddly everything else at 0. Then I noticed CPU usage dropped and both Network and Disk throughput had activity, but still 0B of 8GiB RAM used, and I switched tabs and these 2 messages appear in the console screen:
1. [ 1.778229] vbd vbd-5696: 19 xenbus_dev_probe on device/vbd/5696 2. [165.437239] piix4_smbus 0000:00:01.3: SMBus Host Controller not enabled!).
(LM logo appeared on the console window shortly after the 2 messages above)
At 48 minutes (per the VM_Started time on the Logs tab) the arrowhead cursor first appeared (console window), and the login screen appeared at 52 minutes. At 57 minutes the desktop is displayed (still at 0 GiB RAM usage) and system response is extremely slow.
Another Note - these messages appeared on the xo-ce console window:
[ 3684.425914] rcu: INFO: rcu_sched self-detected stall on CPU [ 3684.425914] rcu: #1-...!: (1 ticks this GP) idle=d22/0/0x1 softirq=3206/3206 fqs=1 [ 3684.425914] rcu: rcu_sched thread starved for 4463 jiffies! g321 f0x0 RCU_GP_WAIT_FQS(5) -> state=0X402 ->cpu=1 [ 3684.425914] rcu: #unless rcu_sched thread gets sufficient CPU time, OOM is now expected behavior. [ 3684.425914] rcu: RCU grace-period kthread stack dump: {Nothing after that on the screen}
Side Note: my original Intel XCP-ng box shows all 4 (CPU, Memory, Network and Disk all peaking roughly at the same time that the Linux Mint OS boots, but on this new Ryzen box, 0B Memory at all times on the Stats screen for the Linux Mint VM. The xo-ce VM does show about 712 MiB Memory usage.
I did try a separate boot, with X2API enabled in the motherboard BIOS before booting XCP-ng, but it didn't make any noticeable difference compared to having it set to Default.
Repeating: I didn't experience sluggishness, problems installing, booting or running Windows 10 from an ISO under XCP-ng on this Ryzen box, only Linux distros.
Far Side Notes: I used the same Linux Mint 21.1 ISO from above and installed it directly on a separate hard drive on the same Ryzen box and it works good. I'm running VMware running under Linux Mint 21.1 and Rocky Linux 9 loaded directly on this Ryzen box on separate hard drives and VMware works good when I boot on those drives (XCP-ng/xo-ce is running on a Samsung 970 plus M.2).
I hope some of the information will be useful to the XCP-ng development team.
I'd be glad to test a fix on my Ryzen box if that would help.
It's a home lab box, so starting over isn't an issue.I'm fairly new to XCP-ng. I'm impressed by the support of the XCP-ng team for home users too.
Addendum - inside the Linux Mint VM, I entered the free command in a terminal window.
Results:total used free shared buff/cache available Mem: 8119596 727396 6087584 20004 1304616 7103828 Swap: 1043340 0 1043340
-
What kind of storage are you using?
edit: I will try on my setup, but a least on Debian I had 0 issue and installation was blazing fast. Can you try on Debian and see if you still have issues?
-
So I installed Linux Mint 21.1 on my Zen4 setup, using a Debian 11 template, with 4vCPUs and 4GiB RAM and 40GiB virtual disk:
- the ISO booted on the OS is less than 1 minute (likely 20 secs)
- the installation itself was 5/6 minutes long tops
After the install, booting the OS took around 15 seconds (to get to the login)
Note: I'm using a cheap Kingston 120GiB SSD, not even an NVMe.
edit: I had to do
bash /mnt/Linux/install.sh -d debian -m 11
to install the tools, but that's it. -
So it's likely an issue with your physical machine (buggy BIOS? thing not enabled?). I would start by the usual
dmesg
in the dom0, and also axl dmesg
. -
Thanks @olivierlambert for going to those lengths to recreate the same steps that I had done.
It's good to know that the new generation of Ryzen processors and motherboards should work without issues for most.
I appreciate the time you spent - it's beyond what I expected.I'll try your suggestions with dmesg, experiment with BIOS settings, and do the install to a regular hard drive for starters. I'm not familiar with dmesg, but I'll read up on that - looks like it could reveal boot issues.
It puzzles me why the only issues I'm having are with Linux VMs on my hardware, but hopefully I can figure out what's causing it. Forgot to mention that I had updated to the latest BIOS update from Asus before doing this install, but it didn't change the behavior from what I was experiencing prior to the update.
If I'm able to get it working, I'll post back here what helped.
- Mike ( @mgales)
-
Okay keep us posted Since it's work here, it's not obvious on what to do, because not reproducing a problem makes it harder to solve
-
BTW, here's the full model number of the motherboard (purchased as part of a combo CPU/RAM/MB package):
ROG STRIX B650E-F Gaming WIFI.
Product URL: https://rog.asus.com/motherboards/rog-strix/rog-strix-b650e-f-gaming-wifi-model/ -
@olivierlambert I have a question for you on the AMD platform you have listed in this thread. I am looking at updating my current setup to a newer amd zen 4 cpu and motherboard. My question is are you able to pass through some of the usb ports from this motherboard or would I still need to use a dedicated pci USB device. I'm asking because currently I have a separate PCI USB device for pass through, but I am hopefull that in my next build I don't need to have a separate USB PCI slot taken and can just use the USB ports on the motherboard?
Thanks for any insight. I do know this is an edge case and most don't need the USB pass through like this, but I have a few USB devices I need to provide to a VM.
Quick Edit, I know you can do just a plan USB passthrough, but I am looking at the PCI level so that on host restarts I dont' have to redo USB passthrough each time. Because right now with my gpu and USB pci card on host restart I dont' have to re-pass through these devices.
Thanks,
Scot -
You can probably pass the entire USB controller (but then you won't be able to get any USB port used outside the passed-through VM)
-
@olivierlambert - I've got dmesg info captured that I can pass along. My friend also decided to build a new Zen 4 box for XCP-ng. He went with (what I think is) a similar motherboard to what you have? (PRIME B650M-A AX):
https://www.asus.com/us/motherboards-components/motherboards/prime/prime-b650m-a-ax/The BIOS's are the same (ver 1222 - 2023/02/24), we loaded the same XCP-ng version (8.3alpha), the same xo-ce version (5.10.0-21-amd64), and the same Linux Mint version (21.1). Processors are similar Ryzen 9 7900 & 7900X. Both have Local APIC Mode set to X2APIC in the BIOS.
His box runs Linux Mint 21.1 without issues (as does yours). We captured dmesg output from all 3 (XCP-ng, xo-ce, Linux Mint 21.1 cinnamon) on both boxes and I've compared them side-by-side and each of the 3 have problems/errors on my machine. I summarized the differences/errors occurring on my box for 2 of the 3 dmesg captures below (XCP-ng and xo-ce).
I'm hoping that this might lead to some patches in XCP-ng software that will allow the Asus ROG STRIX B650E-F Gaming WIFI motherboard to fully work with XCP-ng. (XCP-ng works fine running Windows VMs, but not Linux VMs).
If a patch/fix from XCP-ng doesn't seem likely, then I'll probably replace the current motherboard with the PRIME version.
Below is a summary of what I've noticed in comparisons of XCP-ng dmesg files and xo-ce dmesg files.
In order of appearance, the differences I'm seeing in XCP-ng dmesg from my box are:
-
Hypervisor detected: Xen PV
tsc: Fast TSC calibration failedInstead of:
tsc: Fast TSC calibration using PIT -
no TSC line listed
Instead of:
tsc: Detected 3693.204 MHz TSC -
ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
ACPI BIOS Error (bug): Could not resolve [_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20180810/dswload2-160)
ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20180810/psobject-221)
ACPI Error: Ignore error and continue table load (20180810/psobject-604)
ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-543)
ACPI: 14 ACPI AML tables successfully acquired and loadedInstead of:
ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
ACPI: 13 ACPI AML tables successfully acquired and loaded -
This sequence happened:
[ 0.412184] usbcore: registered new device driver usb
[ 0.412184] WARNING: CPU: 0 PID: 1 at drivers/i2c/busses/i2c-designware-common.c:245 i2c_dw_clk_rate+0x16/0x30
[ 0.412184] Modules linked in:
[ 0.412184] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+1 #1
[ 0.412184] Hardware name: ASUS System Product Name/ROG STRIX B650E-F GAMING WIFI, BIOS 0821 11/15/2022
[ 0.412184] RIP: e030:i2c_dw_clk_rate+0x16/0x30
[ 0.412184] Code: 00 48 c7 c6 5e ee e7 81 31 c0 5d e9 d4 60 f6 ff 0f 1f 40 00 0f 1f 44 00 00 48 8b 47 48 48 85 c0 74 08 e8 1d 35 43 00 89 c0 c3 <0f> 0b 0f 1f 84 00 00 00 00 00 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00
[ 0.412184] RSP: e02b:ffffc9004006fcf8 EFLAGS: 00010246
[ 0.412184] RAX: 0000000000000000 RBX: ffff8881384d2018 RCX: 00000000aeffff00
INFO DELETED
[ 0.412184] ---[ end trace eae5bc73295d4325 ]---
[ 0.412941] pps_core: LinuxPPS API ver. 1 registered
[ 0.412941] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it
[ 0.412942] PTP clock support registeredInstead of:
[ 0.476402] usbcore: registered new device driver usb
[ 0.476402] pps_core: LinuxPPS API ver. 1 registered
[ 0.476402] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it
[ 0.476402] PTP clock support registered
In xo-ce dmesg on my box, I saw:
rcu_sched self-detected stall on CPU (2 occurrences)Would it be useful for me to upload the 3 dmesg capture files from both boxes (i.e. XCP-ng, xo-ce, Linux Mint 21.1)?
-
-
Well, a pretty buggy BIOS on your side doesn't help I suppose
-
I built a system with a Ryzen 7950x on an ASRock B650M PG Riptide motherboard and was having similar issues as mgales. I switch to an ASUS Prime B650M-A II-CSM without any improvement.
With the ASUS Prime, there were no BIOS errors reported.
(I was able to get rid of 'ACPI BIOS Error (bug): Could not resolve [_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20180810/dswload2-160)' by enabling the onboard audio.)I am able to run imported Windows VMs (Windows 10 and Server 2022) without any apparent issues.
I can run an imported AlmaLinux 8 VM with the nopv kernel option.
I can run the AlmaLinux 8 installer with the nopv option.
I can run Xen Orchestra with the nopv option.
I can also run an imported CentOS 6 VM without any additional options.The main issue seems to be a stuck CPU on the Linux VMs when using PV drivers.
Could there be issues specific to Rzyen 7900x and 7950x?
-
so it seems I need to purchase a 7900. I wonder if a non-X will do it
-
@olivierlambert - my friend with the Ryzen 7900 isn't experiencing the same issues that @BlueBadger and myself (with the 7900X) are having - his system is working fine (both of us are running xcp-ng-8.3.testing-2023.02.15-12.19-install.iso).
-
With the same motherboard and the same BIOS settings/version?
-
His is closer to your board (I believe), and to one that @BlueBadger tried (ASUS Prime B650M-A II-CSM)
My friend's board is: ASUS Prime B650M-A AX:
https://www.asus.com/us/motherboards-components/motherboards/prime/prime-b650m-a-ax/He's using the Ryzen 9 7900 and isn't experiencing any problems with Linux VMs in XCP-ng.
-
An idea investigation would be to swap the X and non-X CPU and see if there's a diff.
I'm under the impression it's more a motherboard issue (BIOS, or version) than anything else however
-
I talked to my friend about doing a processor swap test, but he's happy with the way his system is running and doesn't want to take a chance of messing something up. Sorry about that