Rolling pool update failure: not enough PCPUs even though all should fit (dom0 culprit?)
-
Hi,
I have a small test VM cluster that I'm trying to apply a rolling pool update to. There are three physical hosts, with 32, 32, and 12 CPUs, respectively. When I try to initiate the update, it insta-fails with the error:
"CANNOT_EVACUATE_HOST(HOST_NOT_ENOUGH_PCPUS,16,12)"
My understanding is that this means that the updater needs to move a VM requiring 16 vCPUs onto the machine with 12 pCPUs.
The mystery is that none of my VMs need nearly that many CPUs! I've dialed them all down to 2 vCPUs, and the error message is the same.
Looking at the
xe vm-list
output, I do see that two of theControl domain on host: ...
VMs do want 16 vCPUs. Are those potentially the culprit, here? What would be the recommended way to dial down their CPU allocations? I've seen some messages about using thehost-cpu-tune
command and I could try playing around withxe
, but I'm a little hesitant to fiddle around with these parts of the infrastructure without really knowing what I'm doing. -
Of course, just after posting, I think I figured out what's happening.
It looks like the relevant parameter isn't the current number of allowed vCPUs set via the UI (
VCPUs-number
), but the maximum number of vCPUs (VCPUs-max
). One of the VMs in my cluster hadVCPUs-max = 16
. After powering it off, I could reduce this number, and now the RPU appears to be proceeding. -
I have seen this problem before in my test lab. Unfortunately, I didn't document it enough to report here. For me, the solution was also to simply power off the culprit VM to prevent the attempted migration.
In my mind, I think the RPU logic should be using the current running state of VMs to determine resources currently in use and which hosts can support that. Since the move is only temporary. Then again, I'm not in a know of all the factors that went into the decision to have it working the way it is. I'm sure there's a valid reason.
-
That's a security problem, due to Spectre/Meltdown, it's very dangerous to run a VM that could have more vCPUs than pCPUs on a host.