-
So after a look at the logs, we are not sure where the issue comes from. Could be caused by faulty DIMM RAM because there are error messages related to RAM errors (but it's ECC RAM and it claims to have corrected them), but we're not sure.
-
I have withdrawn the update to
xenopsd
.It does fix live migration from older releases when the VM has no platform:device_id defined (see upstream bug), but causes a transient live migration error during the update when the hosts have different patch levels. If you already installed the update and rebooted your hosts no problem, you're already past the issue. If you installed it but haven't moved your VMs around to reboot your hosts, make sure all your hosts have the same patch level and restart their toolstacks before migrating VMs. If you haven't installed the update but you want to install it, it is still available in the
xcp-ng-updates_testing
repository. Thanks to lavamind for helping me debug this over IRC. -
@stormi could you provide some more details about the "transient live migration error" with the xenopsd update?
I have been experiencing issues live migrating larger VM's (20-40GB Ram) especially when the memory is in active use (eg MariaDB database). As far as I can tell my issue is related to the VM dynamic memory being reduced to below the actual memory use of the VM at which point a random processes crash - but not sure why this is happening since both the source and target XCP host are under allocated on memory. We upgraded all hosts in the pool before xenopsd was withdrawn and are currently scheduling in downtime for each VM so we can migrate them in an off state which seems to bypass the issue.
-
@Digitalllama The issue you had is probably not related to that update. If that was caused by the update, then you'd have seen the VM reboot at the end of the migration process and it would have been duplicated on both hosts.
A dynamic memory minimum value set too low can cause the kind of process crash you experienced: the available memory for the VM being low, you may reach a point where the VM's kernel needs to fire the Out Of Memory Killer which will select a running process and kill it. If you want to be sure about that, you can try to set the same value for minimum and maximum dynamic memory and see if the migration issue still occurs.
-
To people not having updated their hosts yet with the latest update: wait a few more days! There's a kernel security update on its way, so you'll probably want to reboot only then.
Note that the security update will be mostly useful for people who put their hosts on a network that is reachable from a potential attacker.
-
The new kernel update candidate is available. As usual, I need some feedback before I can push it to everyone.
Citrix advisory: https://support.citrix.com/article/CTX256725
- XCP-ng 7.5: install it with
yum update kernel --enablerepo='xcp-ng-updates_testing'
- XCP-ng 7.6: install it with
yum update kernel --enablerepo='xcp-ng-updates_testing'
- XCP-ng 8.0 beta/RC1: simply
yum update
It is a security update. A distant attacker could manage to crash your host or raise its memory usage significantly with specially crafted network requests. Hosts isolated from public networks are safe, unless the attacker managed to get into your private network.
Reboot required (we do not support live patching at the moment, due to a closed source component in XenServer / Citrix Hypervisor).
Edit: update pushed: https://xcp-ng.org/blog/2019/07/12/xcp-ng-security-bulletin-kernel-update-sack-vulnerability/
- XCP-ng 7.5: install it with
-
Anyone available to test the security update on 7.5 and/or 7.6? It is a security update, so quite urgent.
-
Installed it a view minutes ago. Will report back.
-
Updated 3 hosts and so far no Problems. Transferred some machines etc no bad effects.
- 2 months later
-
Hello everyone. I'm back from holidays with update candidates that need testing!
XCP-ng 7.6
xcp-ng-xapi-plugins
yum update xcp-ng-xapi-plugins --enablerepo=xcp-ng-updates_testing
This is the most important update. It fixes host memory consumption issues that could go as far as crashing several hosts at the same time (especially if EPEL repositories were active, which shouldn't be the case but often was prior to XCP-ng 8.0 where they are already present but disabled by default). Already fixed in XCP-ng 8.0.
Post-install:
xe-toolstack-restart
microcode_ctl
yum update microcode_ctl --enablerepo=xcp-ng-updates_testing
Microcode update for the SandyBridge family of CPUs regarding the MDS attacks.
Post-install: reboot if you want it to be taken into account.
xcp-ng-pv-tools
yum update xcp-ng-pv-tools --enablerepo=xcp-ng-updates_testing
Linux guest tools: support for SLES 15 SP1, updated README, support for recent CoreOS.
Post-install: nothing to do.
xen
yum update xen-dom0-libs xen-dom0-tools xen-hypervisor xen-libs xen-tools --enablerepo=xcp-ng-updates_testing
Avoids possible memory corruption when forcibly shutting down a VM with AMD MxGPU attached. Or when the guest crashes.
Post-install: reboot to apply the changes.
XCP-ng 8.0
xen + guest templates
yum update xen-dom0-libs xen-dom0-tools xen-hypervisor xen-libs xen-tools guest-templates-json guest-templates-json-data-windows guest-templates-json-data-xenapp guest-templates-json-data-linux guest-templates-json-data-other --enablerepo=xcp-ng-testing
Avoid doing that on several hosts of the same pool at the same time, because the guest-template-json* updates will need to update the XAPI database at the same time. The
/usr/bin/create-guest-templates
tool that is called post-update is not designed to run concurrently (thanks to Silmaril on IRC for finding out at the cost of a broken XAPI database).Changes:
- same fix as in XCP-ng 7.6 regarding VMs with AMD MxGPU attached
- fix a host crash that can occur when you force-shutdown a Windows VM that is in an unclean state
- Windows VMs could hang for more than a minute after live migration
- Windows VMs with the
viridian_reference_tsc
flag enabled could crash during live migration. This fix opens the door to possible performance improvements for your Windows VMs, because following that fix now Citrix advises to setviridian_reference_tsc
andviridian_stimer
flags totrue
for better performance. - Updated Windows VM templates with new default settings that set
viridian_*
to true.
Post-install:
- reboot the host to apply the
xen
changes - consider modifying your existing Windows VM settings for possible better performance. See "After installing this hotfix" in https://support.citrix.com/article/CTX258320
microcode_ctl
yum update microcode_ctl --enablerepo=xcp-ng-testing
Microcode update for the SandyBridge family of CPUs regarding the MDS attacks. XCP-ng 8.0 already contained updated microcodes from Intel when released, before Citrix released a hotfix, but their update contains one additional file so we synced with their package.
Post-install: reboot if you want it to be taken into account.
What we need
As usual, Vates tests the updates internally, but we also rely on the community to widen the test cases and hardware tested, so we need you to install the updates and give us feedback, either positive or negative, before we can consider pushing those updates to everyone!
-
@stormi said in Updates announcements and testing:
XCP-ng 7.6
[...]
microcode_ctl
yum update microcode_ctl --enablerepo=xcp-ng-updates_testing
Microcode update for the SandyBridge family of CPUs regarding the MDS attacks.
Post-install: reboot if you want it to be taken into account.
xcp-ng-pv-tools
yum update microcode_ctl --enablerepo=xcp-ng-updates_testing
I guess that above command for
xcp-ng-pv-tools
should not be the same as the one formicrocode_ctl
XCP-ng 8.0
[...]
microcode_ctl
yum update microcode_ctl --enablerepo=xcp-ng-updates_testing
This command returns error:
# yum update microcode_ctl --enablerepo=xcp-ng-updates_testing Loaded plugins: fastestmirror Error getting repository data for xcp-ng-updates_testing, repository not found
I guess it meant to be:
# yum update microcode_ctl --enablerepo=xcp-ng-testing
as this worked for me.
HTH
-
I installed all these updates for 8.0 and rebooted the host.
The reboot was extraordinary long. The time from shutting ssh session to getting ping packets back again was about 8 minutes. I don't know what it was doing during all this time as I rebooted remotely.
So I rebooted once more to see whether the boot time would be so long again.
This time it was only 1 minutes 46 seconds.The VMs seem to run OK so far, but it's just a test host with two VMs doing almost nothing, so I don't know for sure :-).
-
@MajorTom thanks, I've fixed the post
-
@stormi Installed the updates in our test pool. Until now everything is working. VM migration etc. Also set viridian flags as advised ( xoa vm has this set also? is this OK?)
RegardsChristian
-
I think Viridian will only have an effect on Windows VMs.
-
Thanks to those who tested. Still interested in feedback, including on XCP-ng 7.6.
-
I'll need at least one tester for the latest update candidates on XCP-ng 7.6, and one for 8.0.
-
OK. Installed it on our last 7.6 server. Reboot was OK. VMs run fine. As this is a test host I cannot test more. It runs on AMD so the microcode_ctl should do nothing on our server.
-
@cnaumer Thanks, this is good enough for me at this stage of the testing, so I can push the 7.6 updates now thanks to you!
-
@cnaumer said in Updates announcements and testing:
@stormi Installed the updates in our test pool. Until now everything is working. VM migration etc. Also set viridian flags as advised ( xoa vm has this set also? is this OK?)
RegardsChristian
This was an 8.0 pool here. Just to clarify this.