-
Maybe it was still rebooting, stuck on the shutdown phase, waiting for some kind of I/O or something. This would explain why it didn't respond.
-
I concur. If the shutdown process is stuck somewhere (eg an NFS share), you can't connect at all (connection refused in SSH, no XAPI connection) and it can stays like this for a while.
-
@stormi It was not responding after the update from XO. I could long in via ssh and restarted the tool stack but this did not help XO still could not login. I issued a reboot command via ssh which dropped the ssh session and the host did not reboot most likely due to running VMs. I then yanked the power and the host rebooted and everything is working as it should. The update definitely caused the issue.
-
I did update my home lab without any issue, before, during and after the update (I did test just after the update without any reboot).
-
@olivierlambert This is my home lab as well running on a small form factor PC with an Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
-
XO restarts the toolstack after installing updates. So what first went wrong is this: it couldn't restart. You can't say for sure it's caused by the update, because there are many other reasons that can make this fail. The logs would tell. For example a known bug, being fixed, in xenospd-xc, which makes it unable to restart when something specific happened to VM metadata.
So we'll keep an eye and ears open for any other occurrence of this issue in relation with the update, but I still think there's little chance an update of
sudo
would itself cause this.We'll do a few additional tests to see if we can reproduce.
-
@stormi If you direct me to where the log you need are I can provide them.
-
/var/log/xensource.log
and/var/log/daemon.log
would be the first ones to check. -
-
I see this in
daemon.log
, a message from systemd attempting to shut the system down:Feb 1 22:11:35 xcp-ng-01 systemd[1]: Unmounted /run/sr-mount/5f5a9343-b95a-9bfa-bd3a-bc30d7368058. Feb 1 22:11:35 xcp-ng-01 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected Feb 1 22:11:35 xcp-ng-01 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected Feb 1 22:11:35 xcp-ng-01 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected Feb 1 22:11:35 xcp-ng-01 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected Feb 1 22:11:35 xcp-ng-01 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
There definitely was a network mountpoint (a NFS SR) which was not connected anymore. This explains the long reboot time.
Going up the logs, I see this:
Feb 1 22:01:43 xcp-ng-01 systemd[1]: xenopsd-xc.service: main process exited, code=exited, status=2/INVALIDARGUMENT Feb 1 22:01:43 xcp-ng-01 systemd[1]: Unit xenopsd-xc.service entered failed state. Feb 1 22:01:43 xcp-ng-01 systemd[1]: xenopsd-xc.service failed.
This explains the failed XAPI restart and is likely the known issue with xenopsd I mentioned above.
So, if I'm not wrong, it's good news:
- The xenospd issue is known and a fix is on its way and usually disappears after a reboot.
- The update itself probably didn't cause your issues.
-
haha my "gut feeling" approved
-
@gduperrey said in Updates announcements and testing:
New Update Candidates (xen, xapi, templates)
- Xen: Enable AVX-512 by default for EPYC Zen4 (Genoa)
- Xapi: Redirect http requests on the host webpage to https by default.
- Guest templates:
- Add the following templates: RHEL 9, AlmaLinux 9, Rocky Linux 9, CentOS Stream 8 & 9, Oracle Linux 9
Test on XCP-ng 8.2
From an up to date host:
For Xen, Xapi and Guest templates:
yum clean metadata --enablerepo=xcp-ng-testing yum update xen-dom0-libs xen-dom0-tools xen-hypervisor xen-libs xen-tools xapi-core xapi-tests xapi-xe guest-templates-json guest-templates-json-data-linux guest-templates-json-data-other guest-templates-json-data-windows --enablerepo=xcp-ng-testing reboot
Versions:
- xen-*: 4.13.4-9.29.1.xcpng8.2
- xapi-*: 1.249.26-2.2.xcpng8.2
- guest-templates-json-*: 1.9.6-1.2.xcpng8.2
What to test
Normal use and anything else you want to test. The closer to your actual use of XCP-ng, the better.
Test window before official release of the updates
No precise ETA, but the sooner the feedback the better.
Hello,
Is there any update on the ETA for this? Since it has been almost a month. We'll do the xcp-ng updates again soon and if these patches are close to release we will wait for them to prevent double work.
Cheers,
Niels -
@NielsH We'll wait for the next security update, to ship them together. When exactly security updates are released can't always be predicted or disclosed.
-
New Security Update Candidates (Xen, microcode, ...)
Components are updated to fix vulnerabilities:
- Xen is updated to fix XSA-426. It also includes the previous change which had not been released yet: Enable AVX-512 by default for EPYC Zen4 (Genoa)
- Intel and AMD microcode is updated for various devices:
- Intel update (which in turn links to the advisories)
- AMD advisory
We will also release at the same time:
xcp-ng-release-*
: fixes benign but annoying fcoe-related error messages at boot
And an update candidate which has been tested previously:
- Guest templates: added RHEL 9, AlmaLinux 9, Rocky Linux 9, CentOS Stream 8 & 9, Oracle Linux 9.
Test on XCP-ng 8.2
From an up to date host:
yum clean metadata --enablerepo=xcp-ng-testing yum update "guest-templates-*" "xen-*" microcode_ctl linux-firmware "xcp-ng-release-*" --enablerepo=xcp-ng-testing reboot
Versions:
- xen-*: 4.13.4-9.29.2.xcpng8.2
- microcode_ctl: 2.1-26.xs23.1.xcpng8.2
- linux-firmware: 20190314-5.1.xcpng8.2
- guest-templates-json-*: 1.9.6-1.2.xcpng8.2
- xcp-ng-release-*: 8.2.1-6
What to test
Normal use and anything else you want to test. The closer to your actual use of XCP-ng, the better.
Test window before official release of the updates
48h
-
@stormi I'm running the update on all 8.2.1 hosts. No problems so far.
-
No problem here either on my home lab
-
The update was published earlier today: https://xcp-ng.org/blog/2023/02/20/february-2023-security-update/
-
-
I noticed the there are updates to the Windows Templates. Clicking the or "EYE" in XOA, and the Description for "guest-templates-json-data-windows" seemed a tad smidgeon "buggy". Is that due to git revision description and there were no actual changes to Windows Templates?
Changelog Patch guest-templates-json-data-windows Date January 6, 2023 at 6:00 AM Author Gael Duperrey <gduperrey@vates.fr> - 1.9.6-1.2 Description - Add templates for rhel 9, CentOS Stream 8 and 9, Almalinux 9, Rockylinux 9, Oracle linux 9
guest-templates-json Creates the default guest templates 1.9.6 1.2.xcpng8.2 29.21 KiB guest-templates-json-data-linux Contains the default Linux guest templates 1.9.6 1.2.xcpng8.2 18.68 KiB guest-templates-json-data-other Contains the default other guest templates 1.9.6 1.2.xcpng8.2 11.86 KiB guest-templates-json-data-windows Contains the default Windows guest templates 1.9.6 1.2.xcpng8.2 14.38 KiB
-
@rjt
These rpms come from the same source rpm and, therefore, from the same SPEC file. So when we build it for changes, the Windows one is built too, even if there is no change on the Windows side.
On this revision, we only add new templates for RHEL 9, AlmaLinux 9, Rocky Linux 9, CentOS Stream 8 and 9, and Oracle Linux 9.
There weren't any changes to the Windows templates. -
New Security Update Candidates (Xen)
Xen is being updated to mitigate some vulnerabilities:
-
XSA-427: "Guests running in shadow mode and being subject to migration or snapshotting may be able to cause Denial of Service and other problems, including escalation of privilege". This vulnerability concerns old platforms (Nehalem/Bulldozer families and older) which do not have Hardware Assisted Paging facilitie (EPT/NPT), or modern platforms where this extension is disabled by the firmware or the system software. This also concerns PV guests, which are not officially supported anymore in XCP-ng.
-
XSA-428: "Entities controlling HVM guests can run the host out of resources or stall execution of a physical CPU for effectively unbounded periods of time, resulting in a Denial of Servis (DoS) affecting the entire host. Crashes, information leaks, or elevation of privilege cannot be ruled out".
On the platforms managed by XCP-ng software, with regard of this vulnerability, we would rather talk of "reduction in defence in depth", as the only entity controlling HVM guests is a trusted software (QEMU) running in a trusted domain (dom0). -
XSA-429: The patch completes the original Spectre/Meltdown mitigation work(XSA-254). A malicious PV guest might be able to infer the contents of arbitrary host memory, including memory assigned to other guests. Only AMD and Hygon CPUs which offer SMEP/SMAP facilities are affected. Although PV guests are not officially supported in XCP-ng, we also included a fix for this vulnerability.
Components are also updated to add bugfixes and enhancements:
- Xen
- Update to Xen 4.13.5
- Initial Sapphire Rapids support
- Fix memory corruption issues in the Ocaml bindings.
- On xenstored live update, validate the config file before launching into the new xenstored
Test on XCP-ng 8.2
From an up to date host:
yum clean metadata --enablerepo=xcp-ng-testing yum update "xen-*" --enablerepo=xcp-ng-testing reboot
Versions:
- xen-*: 4.13.5-9.30.3.xcpng8.2
What to test
Normal use and anything else you want to test. The closer to your actual use of XCP-ng, the better.
Test window before official release of the updates
~2 days.
-