XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Rolling Pool Update - not possible to resume a failed RPU

    Scheduled Pinned Locked Moved XCP-ng
    13 Posts 7 Posters 197 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E Offline
      ecoutinho
      last edited by

      Rolling Pool Update succeeded on the master but then failed on the first slave, due to yum caching errors. This resulted in a finished failed task of RPU. I've corrected the yum issue and manually updated and rebooted this slave.

      Now, when I go to Pools > Patches, it shows there are no missing patches, so the RPU button is disabled. This is true for the master and one slave but the remaining slaves are still missing their patches. It seems XO is just checking the pool missing patches on the master, ignoring the other slaves.

      I guess I have two options:

      • wait for the release of a new XCP-ng patch, so that the RPU is enabled once again
      • manually update each host

      Shouldn't XO check for Pool missing patches on all hosts, and allow a RPU to be performed if there is any host with missing patches?

      Orchestra, commit 6ecab
      Master, commit 6ecab

      tjkreidlT 1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        In theory, the pool should be at the exact same level before RPU, so you are in a unplanned/not normal case.

        Future updates will be easier to spot and then we could probably have a more resilient RPU check (adding @stormi in the loop)

        A 1 Reply Last reply Reply Quote 0
        • D Offline
          DustinB
          last edited by

          I had a similar experience with this latest round, due to a variety of issues that occurred, 2 of my hosts lost their gateway and started using the DHCP server, 1 VM wasn't on shared storage and couldn't get migrated, and then of course the Windows Driver 9.4.1 issue as all of my hosts where reporting "this VM doesn't contain the feature" (to live migrate) even though they were updated.

          What you need to do is below

          • Migrate VMs off of your Pool Master
          • Patch your Pool Master
          • Reboot the Pool Master
          • Migrate VMs back onto the pool master, emptying a Slave
          • Patch said slave server
          • Reboot said slave
          • Repeat from the Migration step for each remaining host.
          M 1 Reply Last reply Reply Quote 0
          • M Offline
            manilx @DustinB
            last edited by

            @DustinB Actually you don't have to migrate VM's if you're fine with them shutting down.
            Just update the missing slaves by "yum update" and reboot them. VM's will be shut down and restarted (following their start settings in advanced)...

            tjkreidlT 1 Reply Last reply Reply Quote 0
            • tjkreidlT Offline
              tjkreidl Ambassador @ecoutinho
              last edited by

              @ecoutinho Performing manual updates is an option. The master probably checked the hosts for the hotfix uniformity before the rolling pool upgrade started, but since it failed to complete, you now have a discrepancy in the patch level for those two hosts. I had that happen once because of a root space error, which was a pain to deal with, and though I cannot recall the specific fix, I think I had to migrate the VMs to the updated hosts and do a whole new install after redoing the partition table (it was that dreaded extra "Dell" partition at the time that caused the issue).

              1 Reply Last reply Reply Quote 0
              • tjkreidlT Offline
                tjkreidl Ambassador @manilx
                last edited by

                @manilx If the VMs can be shut down, yes, otherwise migrate the VMs. Luckily, you can migrate from a host with a lower hotfix level to one that has a higher level, but I do not believe the reverse is possible.

                M 1 Reply Last reply Reply Quote 0
                • M Offline
                  manilx @tjkreidl
                  last edited by

                  @tjkreidl True. VM's that can't be shutdown you'll have to shutdown manually.
                  I find that this takes a lot less time than migrating the VM's around. But if they need to be running then there's no other way ¯_(ツ)_/¯

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    Andrew Top contributor @olivierlambert
                    last edited by

                    @olivierlambert Any backups that run cause RPU to fail and not continue/restart. My example of this is hourly continuous replication breaks RPU.

                    tjkreidlT 1 Reply Last reply Reply Quote 0
                    • tjkreidlT Offline
                      tjkreidl Ambassador @Andrew
                      last edited by tjkreidl

                      @Andrew Right, backups should be shut off during the RPU process.

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        I thought we already disabled that but I will ask internally

                        1 Reply Last reply Reply Quote 0
                        • Tristis OrisT Offline
                          Tristis Oris Top contributor
                          last edited by

                          in case of manual RPU - maintenance mode is disabled after update\toolstack restart. So also need to disable balancer plugin.

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            I think the load balancer is already disabled in the RPU

                            Tristis OrisT 1 Reply Last reply Reply Quote 0
                            • Tristis OrisT Offline
                              Tristis Oris Top contributor @olivierlambert
                              last edited by

                              @olivierlambert During RPU - yes. i mean manual update in case of failure.

                              1 Reply Last reply Reply Quote 1
                              • First post
                                Last post