Updates announcements and testing

Andrew

@stormi Microcode updated on affected Gen11 i7. Running normally.

stormi

Thanks for the feedback! Update published: https://xcp-ng.org/blog/2023/11/15/november-2023-security-update/

The blog post also contains information about two vulnerabilities in Xen, but which don't affect XCP-ng in a supported and/or default configuration.

Users of PV guests who still haven't converted them to HVM should consider it, though.

stormi

New security update candidates

As promised in the announcement of the previous security update, here's a new one which includes changes for previously missing XSA updates as well as an updated AMD microcode.

Security updates

xen-*:
- Fix XSA-445 - x86/AMD: mismatch in IOMMU quarantine page table levels. On x86 AMD systems with IOMMU hardware, a device in quarantine mode, using dom_io, could access leaked data from previously quarantined pages. This is not enabled by default in XCP-ng, but can still be enabled at Xen boot time.
- Fix XSA-446 - x86: BTC/SRSO fixes not fully effective. A PV guest could infer memory content from other guests. We do not recommand using PV guests and have been suggesting switching to HVM for a while, so we do hope most users were not impacted by this.
linux-firmware: Update AMD microcode to 2023-10-19 drop, updating the family 19h, so Zen 3, Zen3+ and Zen 4. AMD Advisory here.

Other updates

We plan to also push other, non security, updates at the same time, to pave the way for the upcoming refreshed installation ISOs.

gpumon: suppression of logs which were needlessly written every 5s into /var/log/daemon.log.
tzdata: updated timezones.
vendor-drivers: pull new drivers into XCP-ng:
- igc-module: Intel device drivers for I225/I226
- r8125-module: Realtek r8125 device drivers
- mpi3mr-module: Broadcom mpi3mr RAID device driver

Test on XCP-ng 8.2

yum clean metadata --enablerepo=xcp-ng-testing
yum update "xen-*" linux-firmware gpumon vendor-drivers tzdata --enablerepo=xcp-ng-testing
reboot

The usual update rules apply: pool coordinator first, etc.

Versions:

xen: 4.13.5-9.38.1.xcpng8.2
linux-firmware: 20190314-10.1.xcpng8.2 (Update: now 20190314-10.2.xcpng8.2, which adds firmware for rtl8125)
gpumon: 0.18.0-11.2.xcpng8.2
tzdata: 2023c-1.el7
vendor-drivers: 1.0.2-1.6.xcpng8.2

What to test

Normal use and anything else you want to test. The closer to your actual use of XCP-ng, the better.

Test window before official release of the updates
~4 days

Samuel, along with David and Gaël

olivierlambert

Update done and reboot successful

JeffBerntsen

The update is installed and seems to be working without problems on my two test systems.

Andrew

@stormi Updated on several Intel Xeon servers. Updated on new Intel and AMD (zen3) systems with IGC and r8125 chips. One issue... the base install does not include the standard firmware for the 8125.

gskger

@stormi My two host cluster (HP ProDesk 600 G6) updated without an issue. Let's see how the cluster is performing during the coming days.

stormi

@Andrew I just pushed an updated linux-firmware to testing, with firmware for rtl8125. Should be available within 10 minutes.

Andrew

@stormi r8125 firmware loads.

stormi

So, we found out the AMD vulnerability actually doesn't affect XCP-ng directly, because Xen doesn't use AMD's SEV features currently.

The other two vulnerabilities still need fixing, but they both can only be exploited if XCP-ng is used in an either unlikely or unsupported way. We'll fix them in due course, but won't push the update to everyone today as initially planned. We will delay them slightly to give them a chance to be grouped with future updates and thus cause less maintenance for users.

Thanks for the tests anyway: we will be able to publish these packages whenever we need now.

bleader

New security update candidates (kernel)

A new XSA was published on the 23rd of January, so we have a new security update to include it.

Security updates

kernel:
* Fix XSA-448 - Linux: netback processing of zero-length transmit fragment. An unprivileged guest can cause Denial of Service (DoS) of the host bysending network packets to the backend, causing the backend to crash. This was discovered through issues when using pfSense with wireguard causing random crashes of the host.

Test on XCP-ng 8.2

yum clean metadata --enablerepo=xcp-ng-testing
yum update kernel --enablerepo=xcp-ng-testing
reboot

The usual update rules apply: pool coordinator first, etc.

Versions:

kernel: 4.19.19-7.0.23.1.xcpng8.2

What to test

Normal use and anything else you want to test. The closer to your actual use of XCP-ng, the better.

Test window before official release of the updates
~2 days due to security updates.

stormi

Did anyone install it? The 2 days delay is over and we'll publish today.

Andrew

@stormi Yes, I installed it on a few running hosts. I did not have any kernel crashes before, and none after...

olivierlambert

Installed here, works

CJ

@NielsH Kind of off topic but figured I'd mention it as I only recently discovered this.

Not sure what VMs you're running, but if they can survive being off for a short time (redundancy of services or planned outage) you can reboot the host using Smart Reboot under the Advanced tab. While it incurs some downtime, it allows for a much faster reboot time than migrating the VMs to another server and back.

I use local storage as well and it's been a game changer for dealing with pool patches.

bleader

The update has been published, thanks for the feedback and tests.

https://xcp-ng.org/blog/2024/01/26/january-2024-security-update/

NielsH

@CJ said in Updates announcements and testing:

@NielsH Kind of off topic but figured I'd mention it as I only recently discovered this.

Not sure what VMs you're running, but if they can survive being off for a short time (redundancy of services or planned outage) you can reboot the host using Smart Reboot under the Advanced tab. While it incurs some downtime, it allows for a much faster reboot time than migrating the VMs to another server and back.

I use local storage as well and it's been a game changer for dealing with pool patches.

Cheers, thanks for the suggestion. In our case we actually are phashing out xcp-ng and are in the process of migrating to Proxmox since we can migrate with 30-35Gbit/s there. The disk performance is so much faster there we can perform all the updates in a single day instead of 2 weeks

Another issue we had was that VM migrations of very large VMs (usually 8cores+) are quite impactful. Because we want to use VMs with 24-48 cores and 128GB RAM as well it simply was not usable enough for us. There's several seconds, or sometimes even minutes of downtime during the last phase of the migration with the large VMs.

With Proxmox we have seen very little downtime (<1s) which we are very happy about.

bleader

New security update candidates (xen)

Two new XSAs were published on 30th of January.

XSA-449 impacts PCI passthrough users.
XSA-450 is only impacting the case where Xen is compiled without HVM support, that is not the case in XCP-ng. We therefore chose not to include this fix yet (will likely be included in future versions, maybe not part of a critical security update).

SECURITY UPDATES

xen-*:
* Fix XSA-449 - pci: phantom functions assigned to incorrect contexts. A malicious VM assigned with a PCI device could in some cases access data of a guest previously using the same PCI device. This requires PCI passthrough on a device using phantom functions and reassigning the same device to a new VM to be exploitable.

Test on XCP-ng 8.2

yum clean metadata --enablerepo=xcp-ng-testing
yum update "xen-*" --enablerepo=xcp-ng-testing
reboot

The usual update rules apply: pool coordinator first, etc.

Versions:

xen: 4.13.5-9.38.2.xcpng8.2

What to test

Normal use and anything else you want to test, if you are using PCI passthrough devices that's even better, but we also would be glad to have confirmation from others that their normal use case still works as intended.

Test window before official release of the updates
2 day because of security updates.

JeffBerntsen

This seems to be working fine on my two test systems but I don't do PCI passthrough.

Andrew

@bleader I installed it on a bunch of busy hosts. All are fine, but none used PCI passthrough. The Rolling Pool Reboot in XO was very helpful.