XCP-ng 8.3 updates announcements and testing
-
This happened again today when rebooting a Windows Server on my test host running these updates:
A "Force Reboot" does not solve it, i have to do a "Force Shutdown" and power on again. It will hang at that screen indefinitely. I can reproduce this on two hosts now. It seems if i reboot again immediately after the fact it will be fine but after it has sat running awhile it will always occur. Not sure if it may be GPU passthru releated. I can clone another Windows VM that has no passthru device enabled if that would help.
-
Greg_E We announce updates here. There's no frequency set in stone. We're working on the next wave of updates, that is ready except that we found a regression so we're working on fixing that first.
If you're adventurous, there's a way to install this next wave earlier, but until it passed internal checks, I don't guarantee that your pool doesn't explode with these.
-
flakpyro Could you export the output of
xen-bugtool -y
and make it available somewhere privately? This will contain all logs and information about the hardware. Let us know also of times where you rebooting a VM and got this issue. -
Which version of Server are you running? UEFI, vTPM, Secureboot?
I don't have an Windows Server on my lab right now, but I can give it a quick check and see. Not sure I can pass through the iGPU in my lab though, and production doesn't have a GPU (real servers with ASpeed BMC).
-
The VM is EUFI with no vTPM or Secure Boot enabled. The CPU us a "Xeon E-2336 CPU @ 2.90GHz" Running in a Super Micro Server. We use these servers at remote sites to run a number of VMs including a "Blue Iris" server with an Nvidia T1000 GPU passed thru to it, i have one such servers as a test server as well. The second machine doing it is a Minisforum MS-01 also with a T1000 GPU Passed thru in my home lab. The OS in both cases is Windows Server 2025. Over the weekend i cloned a fresh copy without GPU Passthru to see if it occurs with no GPU. We have about 15 of these VMs running at remote locations on the stable 8.3 patch branch that are not doing this.
It should be noted though that these VMs did need to be customized to allow Blue Iris to run without BSODing the VM on Intel CPUs. Thread can be found here: https://xcp-ng.org/forum/topic/8873/windows-blue-iris-xcp-ng-8-3/35?_=1746455850378 but in the end the following needed to be applied to a VM to keep it from BSODing when running Blue Iris on new Intel CPUs:
xe vm-param-add uuid=... param-name=platform msr-relaxed=true
stormi It does not seem to do it every time, it seems the VM must run for sometime and then be rebooted to cause it to happen. I have created a bug report file from the host as requested and will DM you a link to it! The last time i experienced this would have been at "May 2, 2025 at 4:18 PM (3 days ago)" according to our XOA appliance.
-
I'm guessing it is something to do with the T1000. I've had so flakyness with them in our workstations. You don't have an older K620 or something do you? I'm also wondering if this might be a case for an Intel ARC GPU, they weren't out when I did my last workstation refresh or I'd probably be using them with Creative Cloud applications.
Also can you try change the Management Agent Service from auto-start to delayed auto-start? Server 2025 has been a bit odd for me for a while, and that was back running on Intel hosts (though they are old CPUs).
I'll see if I can get a VM up this afternoon, but I'm on AMD with the iGPU so not sure it even supports the acceleration that you need out of the T1000 for the camera recorders. I need to go through and configure each host to pass through the GPU, audio, and maybe some USB and then reboot. But if it works, it might make a nice Handbrake transcoding VM for me, dump files in, get back to work while it does it's thing.
-
Greg_E I do not have anything older like a K620, i do have a GeForce GTX 1650 in another machine that is also doing this but i believe that's the same generation as the T1000 (Turing). I agree i think ARC could be a great replacement for this application in the future.
Since this is occurring at boot before the OS loads i'm not sure if this would be management agent related?
-
Not going to be able to test this, my little mini-lab is having a very hard time with passthrough, to the point where the host won't boot. Tried this on my backup host (for backup DR testing) and finally got it back after the last couple of hours fooling with turning passthrough on and then back off. Sorry.
-
New update candidates for you to test!
As we move closer to making XCP-ng 8.3 the new LTS release, taking over from XCP-ng 8.2.1, a new batch of update candidates is now available for user testing ahead of a future collective release. Details are provided below.
biosdevname
: Update as a dependency for another component.blktap
: Fix: enable NBD client only after completing handshakecyrus-sasl
: Fix for CVE-2022-24407 (not directly affecting XCP-ng in normal use).gpumon
: Rebuild for XAPI update.intel-ice
: Update ice driver to v1.15.5kernel
: improve timer handling for better compatibility with hardware.plymouth
: Packaging update. No visible changes.psmisc
: Update to version 23.6.python-urllib3
: Update to version 1.26.30.rsync
:- Update to version 3.4.1
- Fixes for CVE-2024-12084, CVE-2024-12085, CVE-2024-12086, CVE-2024-12087, CVE-2024-12088, CVE-2024-12747
- The rsyncd configuration and systemd unit files now come in a separate package named rsyncd-daemon, not installed by default
smartmontools
: Update to version 7.4xapi
:- Drop FCoE support when fcoe_driver does not exists
- FCoE support will be removed from next versions
- No more CPU checks for halted VMs in cross-pool migration
- Move CPU check to the target host during cross-pool migration
- Serialize all PCI and VUSB plugs to keep them ordered
- Fixes multiple issues in periodic scheduler
- Fixes multiple issues in the way XAPI handles RRD metrics
- Improve SR.scan by reducing a racing window when updating the XAPI db
- A lot or maintenance-related changes, XAPI being a very active project.
- Drop FCoE support when fcoe_driver does not exists
xcp-featured
: rebuilt for XAPIxcp-ng-release
: update copyright years and EULAxen
:- Improve support for Zen 5 and Diamond Rapids CPUs
- IOMMU logic improvements and fixes
- add PCI quirks for problematic hardware (e.g Cisco VIC UCSX-ML-V5D200GV2)
- fix emulation of MOVBE
xenserver-status-report
: maintenance update.xha
:- Support configurable syslog printing
- Fixes issue where sub-threads can't be scheduled enough resources.
Test on XCP-ng 8.3
From an up-to-date host:
yum clean metadata --enablerepo=xcp-ng-testing yum update --enablerepo=xcp-ng-testing reboot
The usual update rules apply: pool coordinator first, etc.
Versions
biosdevname
: 0.3.10-5.xcpng8.3blktap
: 3.55.5-2.1.xcpng8.3cyrus-sasl
: 2.1.26-24.el7_9gpumon
: 24.1.0-40.1.xcpng8.3intel-ice
: 1.15.5-2.xcpng8.3kernel
: 4.19.19-8.0.38.1.xcpng8.3plymouth
: 0.8.9-0.31.20140113.3.xcpng8.3psmisc
: 23.6-2.xcpng8.3python-urllib3
: 1.26.20-3.1.xcpng8.3rsync
: 3.4.1-1.1.xcpng8.3smartmontools
: 7.4-2.xcpng8.3xapi
: 25.6.0-1.4.xcpng8.3xcp-featured
: 1.1.8-1.xcpng8.3xcp-ng-release
: 8.3.0-31xen
: 4.17.5-9.1.xcpng8.3xenserver-status-report
: 2.0.11-1.xcpng8.3xha
: 25.0.0-1.1.xcpng8.3
What to test
Normal use and anything else you want to test. The closer to your actual use of XCP-ng, the better.
Test window before official release of the updates
None defined, but early feedback is always better than late feedback, which is in turn better than no feedback
We will not be very available on this forum until Monday to help fixing issues if there are any, so don't update too fast if that's a possible problem for you.*
-
A abudef referenced this topic
-
stormi Updated both of my test hosts. Everything rebooted and came up fine.
No VM stats in XO / XOA i see still. I will be curious if this round of updates fixes my EFI / Windows Server reboot hangs.
-
flakpyro No VM stats despite a reboot of the hosts?
-
stormi No stats until the toolstack restarts, as with previous candidates
-
Ok, let's involve Team-XAPI-Network
-
My master host hung on reboot, after about 10 minutes I forced the power off, and then back on. Host is HP T740 Thinclient with AMD v1756b processor, 64GB of DDR4 SODIMM, Intel dual x520 PCIe card, and an Intel i226-v in the a+e slot, plus the onboard Realtek NIC, BIOS at 1.20 which I think is still current for this model. All three hosts are identical with possible exception of x520 card revisions, host 3 might have an older revision (donations accepted for x710 based cards
)
Otherwise everything went as planned, all three are updated and all three needed to have the toolstack restarted to see the stats. I only have 2 small Linux VMs on this system right now, and both of them started fine.
There were no stats in XO-Lite either, I was doing the second and third hosts from XO-Lite to see if XO was getting mad. The second and third rebooted without issue and something I'll look into with the next update, the delay on master could have been the VMs trying to auto start where I remembered to turn that off for the other two hosts.
-
Installed on my Test pool with 2 HP EliteDesk 800 G3 mini. Except the no stats in XenOrchestra. Did some VM migragration between hosts, reboot testes (only Linux VMs though) and so far no issues.
-
Could you please attach
/var/log/{xensource.log,daemon.log,xcp-rrdd-plugins.log}
from the time after the reboot and before the toolstack restart?Also: does this reproduce if you reboot again - would you still have no metrics until toolstack restart? Because if not, this is entering paranormal territory
UPD: I've just updated xapi from 24.39 to 25.6 myself - stats are working after the reboot and I can see them in XenOrchestra, no additional toolstack restart required.
-
andriy.sultanov said in XCP-ng 8.3 updates announcements and testing:
does this reproduce if you reboot again
Yes, this does. Logs have been provided via pm.
-
andriy.sultanov
Same issue after another reboot. The stats on host just flatline. No CPU usage or anything.
See screenshot
I attached log files
daemon.log.txt xcp-rrdd-plugins.log.txt xensource.log.txt -
bufanda Thanks!
Hmm, can't see anything suspicious either in your logs or abudef'sSince you said it reproduces if you reboot the host again, I'd really appreciate if you could send the result of
rrd2csv
running for 30 seconds or so on a system that doesn't have the stats working - so before any toolstack restarts. -
andriy.sultanov
I ran it for a minute.