XCP-ng 8.3 updates announcements and testing

manilx

@gduperrey Installed @home and @business. RPU had no issues this time.

marcoi

updated on my three servers no issues.

Greg_E

Nothing really to add, my 3 host Intel production pool updated just fine. The load balancer is always a little weird, but I'm sure it is calculated based on CPU and RAM assigned to each VM, where I split things up based on workload.

It's a small system, and the real workload is handled by 3 Windows VMs so I tend to split them up onto one of the three hosts.

I may get to my lab in the next couple of days, but it isn't doing work so testing is kind of pointless right now. The only thing "doing work" is a VM with XO from sources.

stormi

This post is deleted!

stormi

A new batch of non-urgent updates is ready for user tests before a future collective release. Below are the details about these.

Main changes:

edk2: In the virtual firmware for UEFI VMs (OVMF):
- Update the embedded OpenSSL to version 3.0.9 + additional fixes. This mainly addresses network boot of UEFI VMs from an HTTPS server.
- Fix VLAN tag handling, that is fix network boot of UEFI VMs on a tagged VLAN (without handling the VLAN at the pool network level, as this makes it transparent to the VM).
ethtool: Allow ethtool to enable 50G/100G/200G link modes. The dom0 kernel and the Mellanox network adapters driver have been updated accordingly.
expat: Update from 2.1.0 to 2.5.0, bringing security fixes related to XML handling.
guest-templates-json: Add templates for almalinux 10, rocky linux 10, debian 13, oracle linux 10, redhat 10
intel-microcode:
- Update to publicly released microcode-20250812
- Security updates for: INTEL-SA-01249, INTEL-SA-01308, INTEL-SA-01310, INTEL-SA-01311, INTEL-SA-01367
- Updates for multiple functional issues.
- Note: this update is provided with XCP-ng as a convenience, but this doesn't constitute a fix of a vulnerability in XCP-ng, nor does it entirely replace the action of upgrading your firmware.
kernel:
- Enable 50G/100G/200G ethtool link modes in XCP-ng 8.3 kernel and Mellanox network adapters driver. The userspace tool ethtool has been updated accordingly.
- Fix race condition regarding namespace identifier attributes in sysfs
- Fix deadlock on PCI passthrough. This is related to the following Known Issue in XenServer: "When NVIDIA T4 added in pass-through mode to a VM on some specific server hardware, that VM might not power on"
libtpms: Fix CVE-2025-49133 in libtpms - "Potential out-of-bounds access and abort in libtpms due to inconsistent HMAC signing parameters". A guest process with access to the TPM can cause the guest's TPM server to crash by sending malicious commands. This crash does not affect other VMs or the host.
lvm2: Performance improvements for LVM-based SRs on systems with a large number of VDIs.
mellanox-mlnxen: Enable 50G/100G/200G ethtool link modes (goes with kernel patches and ethtool patches). Note: this set of changes was initially made by XenServer, and we haven't had the occasion to test it.
qlogic-qla2xxx: Update to version 10.02.13.00_k. Bug fixes only.
sm:
- Adapt the LargeBlock SR driver following the change of configuration in lvm2 rebase.
- Robustify LINSTOR volume size retrievals to avoid throwing exceptions when not needed.
- Improve LINSTOR DB robustness: detect failure with a small delay, use specific DRBD options to fit for drbd-reactor.
- Limit LINSTOR logs in SMlog.
- Rewrite of the handling of DRBD/LINSTOR command calls: in particular speed gain concerning scan commands.
- Robustify LINSTOR DB umount call in case of network outage.
- Robustify the garbage collector to avoid falsely marking VDIs as hidden, subsequently preventing journal rollbacks.
- Fix error message reported when a snapshot failed.
- Robustify LinstorSR creation with thick mode.
varstored: Provide the latest Secure Boot certificates from Microsoft by default. This is a big change compared to the past situation where you had to prepare pools for Guest Secure Boot by manually running a command or clicking a button in Xen Orchestra. Now, if you haven't install any UEFI certificates to the pool, it will automatically use the latest we provide on the system to set up Secure Boot on new VMs. Existing VMs are untouched. Additionally, this will allow to support Secure Boot with future Windows media that no longer use the expired 2011 certificates. Documentation updates are on their way: https://github.com/xcp-ng/xcp-ng-org/pull/328.
xapi:
- Notable fixes
  - Consoles are now started for PVH guests => steps towards supporting PVH virtualization mode.
  - Stop ballooning down memory on localhost migration => VDI migration to another SR no longer fails because of unrelated memory configuration.
  - Allow SHA-512 in host certificates.
  - Better error reporting for other_operation_in_progress => now describing what operation was blocking another one.
  - Avoid trying to suspend a VM which doesn't support it, thus preventing a VM crash.
  - Fix an issue that disabled CBT unnecessarily on VDIs on shared SRs during VM live migration. This will also allow to live migrate such VMs during a rolling pool update.
  - Message.get_all_records_where now properly evaluates the query. This will be leveraged by Xen Orchestra to get some information from XAPI faster (by fetching smaller amounts of items).
  - Fix issues with emergency network reset on IPv6 hosts
- Notable features
  - Best effort mode for NUMA: This is not enabled by default at the moment, but when enabled this means that xapi will try to use a single NUMA node when creating VMs. It is best effort, meaning that this strategy sometimes fails and instead all nodes are used. Especially when many VMs are started or migrated at the same time. Test instructions are provided in the "What to test" section below.
  - Host evacuation was parallelized further, so that the migrating flow of VMs is maintained, avoiding bottlenecks.
  - Storage migration reworked: Allows migration from and to SMAPIv3 storage backends => a step towards SMAPIv3, but not immediately usable.
  - CLI interface: improved autocompletion (xe).
  - New HA option to avoid rebooting VMs on internal shutdown: https://docs.xcp-ng.org/management/ha/#halting-the-vm
- A lot of other fixes and internal improvements.
xen:
- Enhance support for Intel Granite Rapids systems
- Fix PCI passthrough on some systems
- Add additional CPU RRD metrics
xenserver-status-report: bug fixes + collection of additional debug data.
xo-lite: update to version 0.15.0. See Xen Orchestra's release announcements for the changelog.

Other changes

blktap: Add a log line when an operation is not supported.
gpumon: Rebuilt for updated XAPI.
libarchive: Fixed libarchive package to refresh the ldconfig cache, whose lack impacted driver disk generation
samba: remove unneeded dependencies.
swtpm: Rebuilt for updated libtpms.
xcp-featured: Rebuilt for updated XAPI.
xcp-python-libs: Various fixes.
xha: Various fixes.

Added dependency:

qcow-stream-tool: will be used by XAPI when support for the QCOW2 disk format is added.

XOSTOR
In addition to the changes in common packages, the following XOSTOR-specific packages received updates:

kmod-drbd:
- Update DRBD kernel module (improvements, fixes)
- Important update to fix memory leaks during DRBD resource synchronization: This bug could be triggered during the addition of a node, or after an LINSTOR evacuation following a HW problem and a recreation of the resources.
linstor:
- Updated the LINSTOR controller and satellite packages.
- In particular, a resizing issue that may block a resource has been fixed.

Test on XCP-ng 8.3

yum clean metadata --enablerepo=xcp-ng-testing
yum update --enablerepo=xcp-ng-testing
reboot

The usual update rules apply: pool coordinator first, etc.

Versions:

blktap: 3.55.5-6.1.xcpng8.3
edk2: 20220801-1.7.10.1.xcpng8.3
ethtool: 4.19-3.xcpng8.3
expat: 2.5.0-3.xcpng8.3
gpumon: 24.1.0-65.1.xcpng8.3
guest-templates-json: 2.0.14-1.1.xcpng8.3
intel-microcode: 20250715-1.xcpng8.3
kernel: 4.19.19-8.0.43.1.xcpng8.3
libarchive: 3.3.3-1.1.xcpng8.3
libtpms: 0.9.6-3.xcpng8.3
lvm2: 2.02.180-18.2.1.xcpng8.3
mellanox-mlnxen: 5.9_0.5.5.0-3.1.xcpng8.3
qcow-stream-tool: 25.27.0-2.1.xcpng8.3
qlogic-qla2xxx: 10.02.13.00_k-1.xcpng8.3
samba: 4.10.16-25.2.xcpng8.3
sm: 3.2.12-10.2.xcpng8.3
swtpm: 0.7.3-9.xcpng8.3
varstored: 1.2.0-3.1.xcpng8.3
xapi: 25.27.0-2.1.xcpng8.3
xcp-featured: 1.1.8-2.xcpng8.3
xcp-python-libs: 3.0.8-1.1.xcpng8.3
xen: 4.17.5-20.1.xcpng8.3
xenserver-status-report: 2.0.15-1.xcpng8.3
xha: 25.1.0-1.1.xcpng8.3
xo-lite: 0.15.0-1.xcpng8.3

XOSTOR

kmod-drbd-9.2.14-1.0.xcpng8.3
linstor-common-1.29.2-1.el7_9
linstor-controller-1.29.2-1.el7_9
linstor-satellite-1.29.2-1.el7_9

What to test

Normal use and anything else you want to test.

Additional focus can be given to:

Network boot of a UEFI VM from HTTPS server
Network boot of a UEFI VM on a tagged VLAN (without handling the VLAN at the pool network level, as this makes it transparent to the VM)
UEFI Secure Boot: new VMs, existing VMs, ... And/or just verify that you understand the updated documentation (work in progress)
Testing on Intel Granite Rapids systems
50G/100G/200G link modes with Mellanox network adapters, using ethtool
vTPM
Hardware depending on the qla2xxx driver, that is Qlogic 2500/2600/2700/277x/2800 Series Fibre Channel Adapters
NUMA's best effort mode, for hosts with more than one NUMA node:
Turn it on, per host, by running xe host-param-set numa-affinity-policy=best_effort uuid=$HOST_UUID.
Now new VMs should be able to be allocated to single NUMA nodes. To verify that it works, (re)start a VM that can fit in a single NUMA memory node, run xl debug-keys u on dom0, then check the output by running xl dmesg. The last lines of the output will show on which numa nodes each domain (VM) has memory allocated, you should see that the domain that was restarted, usually the one with the highest domain number; has several nodes below it and that all of them except one have the number 0.

Known issues

XAPI's handling of remote logging changed. XAPI now expects a configuration file in a specific location, and we haven't applied this system change yet. We'll publish the related update candidate to complement the current batch in the next days.

So: don't attempt to set up remote logging yet. If you set it up previously, then it should continue to work.

Test window before official release of the updates

~10 days. But please test as early as possible.

dinhngtu opened this pull request in xcp-ng/xcp-ng-org

closed Announce Secure Boot changes #328

flakpyro

@stormi Installed on my usual test hosts (Intel Minisforum MS-01, and Supermicro running a Xeon E-2336 CPU). Also installed onto a 2 host AMD epyc pool. Updates went smooth, backups continue to function as before.

3 windows 11 VMs had secure boot enabled. In XOA i clicked "Copy pool's default UEFI certificates to the VM" after the update was complete. The VMs continued to boot without issue after.

Andrew

@stormi Running.... but...

#  secureboot-certs install
Traceback (most recent call last):
  File "/usr/sbin/secureboot-certs", line 770, in <module>
    install(session, args)
  File "/usr/sbin/secureboot-certs", line 313, in install
    validate_args(args)
  File "/usr/sbin/secureboot-certs", line 288, in validate_args
    if os.path.exists(args.PK) and not is_auth(args.PK) and not getattr(args, "pk_priv", False):
  File "/usr/sbin/secureboot-certs", line 381, in is_auth
    with open(path, "rb") as f:
IOError: [Errno 21] Is a directory: 'default'

Also, can this update also include the fixvwc qemu Windows crash fix?

dinhngtu

@Andrew Looks like there's something named "default" in your current directory that's confusing the script. Try changing to another directory before running the command.

The VWC fix won't be part of this update wave, but will hopefully arrive soon.

Andrew

@dinhngtu Correct! I was in /etc/ which has a default directory.

bufanda

@stormi Installed on two node Pool. Both nodes are HP EliteDesk 800 G3 Miini. 10 Linux VMs with Secure Boot enabled and 1 Windows Server 2019 with SecureBoot and vTPM enabled. No issues. Migration between nodes worked flawlessly too.

stormi

@flakpyro said in XCP-ng 8.3 updates announcements and testing:

@stormi Installed on my usual test hosts (Intel Minisforum MS-01, and Supermicro running a Xeon E-2336 CPU). Also installed onto a 2 host AMD epyc pool. Updates went smooth, backups continue to function as before.

3 windows 11 VMs had secure boot enabled. In XOA i clicked "Copy pool's default UEFI certificates to the VM" after the update was complete. The VMs continued to boot without issue after.

If you want to go further with the test, you need to clear your pool's secure boot certificates (the ones you probably had installed in the past from XO to "set up the pool for Guest SB"), so that the new pool defaults become the ones we provided with the update.

Then you can try again propagating the certs to the VMs.

gskger

@stormi Update two pools with a total of six host (HP ProDesk 600 G6 and Dell Optiplex 9010) and two Dell R720 with GPUs. Update went smooth and no issues running for two days now (with backup and restore)

Andrew

@stormi After pool update, XO Continuous Replication times dropped by 50%. Before, the hourly CR took about 14-15 minutes, after the update it takes about 7-8 minutes now. Most of the CR time is spent on setup of the VM for transfer, not the actual data transfer bandwidth. Normal delta backup times did not change (data transfer limited). No change/update in XO/hardware/network, just this XCP update.

stormi

@Andrew Nice. What kind of SR?

olivierlambert

I can assume LVM based SR with a reasonable number of VDI, so metadata operation (mounting the VDI) took a lot of time.

Andrew

@stormi Pool source SR is NFS. Destination has local EXT4. It's only around 70 VMs.

olivierlambert

Then IDK why it's a lot faster

stormi

@olivierlambert LVM also plays a role with such SRs, maybe that's it. Or it's another optimization. XAPI had some too.

stormi

New update candidates for you to test! (adding to the previous batch)

New updates join the previous batch of update candidates. I also take this opportunity to call for more feedback on the previous batch of updates, in particular on the changes mentioned in its "What to test" part. Anyway, installing this batch will also install the previous one.

Main changes:

qemu: Fix BSODs on VMs having the Windows Server 2025 September update and emulated NVMe controllers
xcp-ng-pv-tools: FINALLY, we could embed our own, signed, Windows Guest Tools in the guest tools ISO shipped with XCP-ng! See https://xcp-ng.org/blog/2025/10/10/signed-windows-pv-drivers-now-available/
xcp-ng-xapi-plugins:
- Reworked sdncontroller plugin to properly support all network types:
  - Standard networks on physical devices
  - Bonded networks
  - VLAN on top of either standard networks or bonds
  - Private networks
- Support per-VIF rules, as well as network-wide rules (no UI in XO at this time, xo-cli recommended)

Other changes:

Optional packages:

netdata: Minor change in the systemd unit file to avoid minor log pollution. No functional change.

Test on XCP-ng 8.3

yum clean metadata --enablerepo=xcp-ng-testing
yum update --enablerepo=xcp-ng-testing
reboot

The usual update rules apply: pool coordinator first, etc.

Versions:

qemu: qemu-4.2.1-5.2.12.2.xcpng8.3
xcp-ng-pv-tools: xcp-ng-pv-tools-8.3-13.xcpng8.3
xcp-ng-xapi-plugins: xcp-ng-xapi-plugins-1.15.0-1.xcpng8.3

Optional packages:

netdata: netdata-1.47.5-4.2.xcpng8.3

What to test

Normal use and anything else you want to test.

Additional focus can be given to:

Everything we mentioned in the previous batch
Make sure Windows+Linux VM installation and booting works on UEFI without PV drivers (that's when the NVMe emulated disks are used)
XCP-ng's signed Windows Guest tools that are finally available on the guest tools ISO!

Known issues

XAPI's handling of remote logging remains to be fixed before the release.

So: don't attempt to set up remote logging yet. If you set it up previously, then it should continue to work.

Test window before official release of the updates

~5 days.

acebmxer

@stormi

Sorry if this has been mentioned before.

You state to run the below command to test the updates.

yum clean metadata --enablerepo=xcp-ng-testing
yum update --enablerepo=xcp-ng-testing
reboot

How to revert changes if needed to? and/or how to switch back to normal repo?