Rolling Pool Update - not possible to resume a failed RPU

olivierlambert

Hi,

In theory, the pool should be at the exact same level before RPU, so you are in a unplanned/not normal case.

Future updates will be easier to spot and then we could probably have a more resilient RPU check (adding @stormi in the loop)

DustinB

I had a similar experience with this latest round, due to a variety of issues that occurred, 2 of my hosts lost their gateway and started using the DHCP server, 1 VM wasn't on shared storage and couldn't get migrated, and then of course the Windows Driver 9.4.1 issue as all of my hosts where reporting "this VM doesn't contain the feature" (to live migrate) even though they were updated.

What you need to do is below

Migrate VMs off of your Pool Master
Patch your Pool Master
Reboot the Pool Master
Migrate VMs back onto the pool master, emptying a Slave
Patch said slave server
Reboot said slave
Repeat from the Migration step for each remaining host.

manilx

@DustinB Actually you don't have to migrate VM's if you're fine with them shutting down.
Just update the missing slaves by "yum update" and reboot them. VM's will be shut down and restarted (following their start settings in advanced)...

tjkreidl

@ecoutinho Performing manual updates is an option. The master probably checked the hosts for the hotfix uniformity before the rolling pool upgrade started, but since it failed to complete, you now have a discrepancy in the patch level for those two hosts. I had that happen once because of a root space error, which was a pain to deal with, and though I cannot recall the specific fix, I think I had to migrate the VMs to the updated hosts and do a whole new install after redoing the partition table (it was that dreaded extra "Dell" partition at the time that caused the issue).

tjkreidl

@manilx If the VMs can be shut down, yes, otherwise migrate the VMs. Luckily, you can migrate from a host with a lower hotfix level to one that has a higher level, but I do not believe the reverse is possible.

manilx

@tjkreidl True. VM's that can't be shutdown you'll have to shutdown manually.
I find that this takes a lot less time than migrating the VM's around. But if they need to be running then there's no other way ¯_(ツ)_/¯

Andrew

@olivierlambert Any backups that run cause RPU to fail and not continue/restart. My example of this is hourly continuous replication breaks RPU.

tjkreidl

@Andrew Right, backups should be shut off during the RPU process.

olivierlambert

I thought we already disabled that but I will ask internally

Tristis Oris

in case of manual RPU - maintenance mode is disabled after update\toolstack restart. So also need to disable balancer plugin.

olivierlambert

I think the load balancer is already disabled in the RPU

Tristis Oris

@olivierlambert During RPU - yes. i mean manual update in case of failure.