@cg said in Question on CPU masking with qemu and xen:
In the early days (~XenServer 6) it had to be done manually
Yes, and I rewrote it entirely in XenServer 7 because doing it manually was absurd.
tl;dr, for your case:
Add the Gen12's to the pool
Migrate remaining VMs off the Gen 9's
2a. Any VMs which can't migrate for feature reasons, reboot first then migrate
Remove the Gen9's from the pool
Reboot all VMs
The longer answer:
When Xen boots, it calculates what it can offer to guests, feature wise. This takes into account the CPU, firmware settings, errata, command line parameters, etc. This feature information is made available to the toolstack/xapi to work with. On a per-VM basis, Xen knows the features that the guest was given. Different VMs can have different configurations, even if they're running on the same host.
An individual VM's features are fixed during it's uptime (including migrate). The only point at which the features can safely change is when the VM reboots. All the migration safety checks are performed as "is the featureset this VM saw at boot compatible with the destination host it's trying to run on".
At a pool level, Xapi always dynamically calculates the "pool level". i.e. the common subset[*] of features that will allow a VM to migrate to anywhere in the pool. Importantly, this is recalculated as pool members join and leave the pool, including a pool member rebooting (where it leaves temporarily, then rejoins. Feature information may change after the reboot, e.g. changing a firmware or command line setting).
When a VM boots, it gets given the "pool level" by default, meaning that it should be able to migrate anywhere in the pool as the pool existed at the point of booting the VM. If you subsequently add a new host to the pool, the pool level may drop and already-running VMs will be unable to migrate to this new host, but will be able to migrate to other pool members.
As you remove members from the pool, the pool level may rise. e.g. if you removed the only host that was lacking a certain feature. The final reboot in your case is to allow the VM's to start using the Gen10 feature baseline, now that it's not "levelled down" for compatibility with the Gen9's.
~Andrew
[*] While subset is the intuitive way to think of this operation, it's not actually a subset in the mathematical sense. Some features behave differently to maintain safety for the VM.