Hiya.
Just came out of a couple weekends' worth of troubleshooting an issue booting VMs on a clean install of XCP-ng.
The good news is that my original issue is resolved for now -- but it involves downgrading my system's firmware that, depending on where the bug lies, might be of interest here. At the very least, hopefully this helps others with similar issues and maybe spurs on a solution that doesn't lock me into older firmware.
TLDR: A Dell firmware update that claims to resolve vulnerabilities (among others) related to TianoCore and EDK2 (DSA-2023-344) breaks VMs ability to boot and freezes the console. For future reference, version 1.27.0 currently works as of 3/10.
System:
A Dell Wyse 5070 Extended; this is a re-purposed thin client that has gained popularity in home-lab circles as a low-power, x86-64 alternative to comparable SBCs like the Pi. Personally, I'm hoping to utilize this as a failover VM host in my network.
It is not on the Xen HCL, but its Intel J5005 is perfectly capable of HVM -- and as part of my troubleshooting, works fine with other hypervisors I've tested. Other tests I've ran such as memtest, drive health, etc. all came back passing.
The only pain point that stood out to me initially was that this system does away with legacy boot support entirely in favor of EFI-only booting -- this may be related to the issue I'm experiencing, but without the ability to try a legacy boot, I have no way to test.
Behavior:
After a successful install of either 8.2.1 or 8.3 beta -- attempts to spin up a new VM from XO are met with the CPU usage pegged at 100% and a VM console that appears 'frozen' -- unable to accept any input after a second or two of activity, usually freezing on the installer menu of whatever ISO I've loaded up.
I initially thought this might be an issue with the console itself, but the behavior is consistent across XO/XOA, XCP-ng Center for Windows, as well as the XO-lite preview on 8.3.
Ended up messing around with a ton of VM settings, system UEFI options (C-states, SpeedStep, etc.), GRUB options (attempting both Safe Mode and the alt kernel), as well as swapping out system components for hours.
My breakthrough came with trying a FW downgrade. When I bought the system, it came pre-loaded with Dell FW version 1.28.0 -- this was only one revision behind the latest 1.29.0 (only released days ago it seems). Both of these versions exhibited the same freezing issue. However, downgrading to 1.27.0 seems to totally fix everything.
Version 1.27.0: (WORKING)
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=hwdwd&oscode=biosa
Version 1.28.0: (NOT WORKING)
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=dpfjj&oscode=biosa
Security Advisory 1.28.0 supposedly fixes: (DSA-2023-344)
https://www.dell.com/support/kbdoc/en-bb/000217986/dsa-2023-344
I'll admit that I'm not versed enough in the details to know what exactly is changed with this FW update -- but it seems related to recently-discovered "PixieFail" vulnerabilities with TianoCore/EDK2.
You may be aware of all this already, but I wanted to share my experience, findings and temporary solution for my particular system. If anyone has any insight or suggestions that they're willing to share, I'm all ears.
Thanks!