Notes on upgrade from xcp-ng 7.4 to xcp-ng 7.5

revtel

We started with the pool master using the ISO install method as the yum upgrade method failed on a test stand alone system but succeeded with the ISO method.

Upgrade process went smoothly with no downtime experienced with any VMs. We have a mixed environment of windows 2k12 and linux variants as guests.

After the final machine was upgraded (and latest patches applied via yum update and host restart) we started running into problems. Hopefully this will help someone avoid these problems:

We noted that the new microcode flagged a "new cpu features enabled" notice in xcp-ng center. We thought that was odd, since the side channel mitigations cripple cpus, not enhance them. I think this in retrospect changed the "mask".
Migration of some but not all windows vms after upgrade and patch ran into problems. migration would hang and never complete. It required a shutdown from the console and a toolstack restart to regain hypervisor level control of the hung migrating vm. We managed to move the hung vms in shutdown state.
Moving LINUX vms after the upgrade caused some to hang in the same manner - one of them even crashed and would not restart. After 2 toolstack restarts we managed to restart this critical VM.
I noted a post from Tobias over on the citirx xenserver forum that advised that since the microcode for the new foreshadow vulnerability changes the cpu mask, a shutdown and restart of the VM is necessary to regain VM agility. I'm hopeful this will solve the migration issues (and kernel updates on the linux vms).

Bottom line - in our environment we believe it is required to shutdown and restart each vm in order to regain reliable agility to move between hosts a running vm. We can't really test this fully until we have a maintenance window scheduled.

olivierlambert

Hi,

Yes, it's likely you would have run the same issue with XenServer. Those latest security patches are doing strong side effects and I think the priority was to fix the issue rather than keeping "agile" with the VMs. We discovered that later too.

Anyway, due to those patches, it's likely you'll have to reboot your VMs to get the own kernel updates working.