XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Host failure after patches

    Scheduled Pinned Locked Moved Management
    21 Posts 2 Posters 325 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      McHenry
      last edited by

      I have attempted to install outstanding patches on a pool master however now the host will no longer boot.

      At startup the xcp-ng logo shows and when the progress bar gets to the end

      Finally the the error shown is:
      d8b621c7-ef23-4c87-9a4a-cac885272e99-image.png

      Being the pool master I am unable to access any other hosts.

      I believe I will need to elevate an existing host to be the new host master until this is resolved. Is this correct?

      M 1 Reply Last reply Reply Quote 0
      • M Offline
        McHenry @McHenry
        last edited by

        When I log into a member host I see:
        578af9b8-01c6-4e6f-969b-570979d83ede-image.png

        I know this host is a pool member but this message says otherwise:
        2f08a63e-f88b-4b3e-bf79-6a1bc0ca38ec-image.png

        Not sure what to do here...

        F 1 Reply Last reply Reply Quote 0
        • F Online
          flakpyro @McHenry
          last edited by flakpyro

          @McHenry i think the command you need to run on a current slave is

          xe pool-emergency-transition-to-master
          followed by
          xe-toolstack-restart
          

          This will make that slave the new pool master. You should only do this though if the current pool master for sure dead.

          This may also be useful:
          https://docs.xenserver.com/en-us/xenserver/8/dr/machine-failures.html

          M 1 Reply Last reply Reply Quote 2
          • M Offline
            McHenry @flakpyro
            last edited by

            @flakpyro

            Thank you.

            So there is no temporary workaround to get the VMs up again whilst I try to fix the pool master?

            I guess I could:

            • Make the slave the new master
            • Rebuild the failed master
            • Elevate the rebuilt master to be the new pool master
            F 1 Reply Last reply Reply Quote 0
            • F Online
              flakpyro @McHenry
              last edited by

              @McHenry i think that us your best bet, you can also leave the new master as the master, and the rebuilt host as a salve. Did the other hosts all update ok? Kind of concerning that running updates resulted in an unbootable host.

              M 1 Reply Last reply Reply Quote 1
              • M Offline
                McHenry @flakpyro
                last edited by

                @flakpyro

                I have updates for all hosts but I understand I need to patch the host master first so that is what I was doing.

                Will I lose all my pool settings like VLANs etc?

                F 1 Reply Last reply Reply Quote 0
                • F Online
                  flakpyro @McHenry
                  last edited by

                  @McHenry No moving the pool master shouldn't result in any config loss.

                  M 1 Reply Last reply Reply Quote 1
                  • M Offline
                    McHenry @flakpyro
                    last edited by

                    @flakpyro

                    OK, pool is now up and accessible 🙂

                    Now I am restoring a VM that resided on the failed master. Fingers crossed.

                    Interestingly, the master failed to boot after installing patched. I had also installed the patches on the slave but not yet rebooted the slave. When I elevated the slave to master it said it would reboot in 10 seconds and I thought SHIT what if this fails too.

                    I guess the learning here is to only install the patches when the host is ready to be restarted, not before.

                    Thanks for the quick response.

                    M 1 Reply Last reply Reply Quote 0
                    • M Offline
                      McHenry @McHenry
                      last edited by

                      I maybe spoke to soon.

                      The host with my DR VM does not show as being part of the pool
                      47e72c3b-f3ca-477c-98ec-12ffc3212bdc-image.png

                      Does I need to add it to the new pool master?

                      M 1 Reply Last reply Reply Quote 0
                      • M Offline
                        McHenry @McHenry
                        last edited by

                        With a new pool master I am still unable to access the old pool slave as it reports the pool master is unavailable:
                        b2daad97-a7e1-4701-9852-788599fae4ea-image.png

                        I expect this is because the pool slave is still looking for the old pool master and has not identified that a new pool master exists.

                        What needs to be done to the pool slave to have it appear as a member of the pool again?

                        F 1 Reply Last reply Reply Quote 0
                        • F Online
                          flakpyro @McHenry
                          last edited by

                          @McHenry did you run

                          xe pool-recover-slave
                          

                          after selecting a new master?

                          M 1 Reply Last reply Reply Quote 0
                          • M Offline
                            McHenry @flakpyro
                            last edited by McHenry

                            @flakpyro

                            No I didn't. Does this need to be run on the slave itself or the new master?

                            When you say "after selecting a new master" do you mean after I did this on the new master?

                            xe pool-emergency-transition-to-master
                            

                            Edit:
                            Found this which shows this is run on the new pool master:
                            https://www.ervik.as/how-to-change-the-pool-master-in-a-xenserver-farm/

                            F 1 Reply Last reply Reply Quote 0
                            • F Online
                              flakpyro @McHenry
                              last edited by

                              @McHenry

                              Yes you'd run that command on the pool master.

                              I linked this earlier in the thread, it outlines the process you need to follow:
                              https://docs.xenserver.com/en-us/xenserver/8/dr/machine-failures.html

                              Let us know if this works!

                              M 1 Reply Last reply Reply Quote 0
                              • M Offline
                                McHenry @flakpyro
                                last edited by

                                @flakpyro

                                Apologies, I missed that. I have run the command and now see:
                                cda41870-c92a-4ff4-a0c5-3359db209d05-image.png

                                hst103 is the new pool master. hst100 is the old pool master that failed.

                                In XO I can only see hst103 under hosts however all three hosts are listed under the pool:
                                c7ec1829-fd07-486e-9280-723544d3a1b3-image.png

                                F 1 Reply Last reply Reply Quote 0
                                • F Online
                                  flakpyro @McHenry
                                  last edited by

                                  @McHenry Have you tried restarting the toolstack on the hosts since running pool-recover-slave?

                                  I have only had to do this once before but remember it going fairly smoothly at the time. (As smooth as you can expect a host failure to be anyways)

                                  M 1 Reply Last reply Reply Quote 0
                                  • M Offline
                                    McHenry @flakpyro
                                    last edited by McHenry

                                    @flakpyro

                                    Have restarted the tool stack but no different.

                                    When I view hst110 I see:
                                    30f512bf-b0e0-4aea-ac57-6912519c401d-image.png

                                    I was tempted to restart it however it has Patches pending installation after a restart and as the mater is not fully patched I thought best not to restart it. I understand the master needs to be patched first.

                                    I just checked xsconsole on hst110 and it shill shows pool master unavailable
                                    8bdfff57-f781-4bfe-b970-685c6e7da261-image.png

                                    Do I need to change the pool master used by the slave?
                                    d7798915-9ee9-480c-889f-b700f48a7f97-image.png

                                    F 1 Reply Last reply Reply Quote 0
                                    • F Online
                                      flakpyro @McHenry
                                      last edited by

                                      @McHenry I am wondering if you are in a situation where you need to reboot. You have patches installed but have not rebooted, but have restarted tool stacks on the salves, which means some components have restarted and are running on their new versions? If you have support it may be best to reach out to Vates for guidance.

                                      M 1 Reply Last reply Reply Quote 0
                                      • M Offline
                                        McHenry @flakpyro
                                        last edited by

                                        @flakpyro

                                        Can I restart the slave and install patches if the master has not been patched yet?

                                        F 1 Reply Last reply Reply Quote 0
                                        • F Online
                                          flakpyro @McHenry
                                          last edited by

                                          @McHenry No the pool master must always been patched and rebooted first. Do you have a pool metadata backup? Are your VMs on shared storage of some sort? In case you need to rebuild the pool.

                                          M 2 Replies Last reply Reply Quote 0
                                          • M Offline
                                            McHenry @flakpyro
                                            last edited by

                                            @flakpyro

                                            My setup is pretty basic.
                                            I have two hosts in the pool, one for running VMs on local storage and one for DR backups on local storage.
                                            I'd like to setup shared storage so i could run the VMs on multiple hosts and seamlessly move them between hosts without migrating storage too.

                                            To setup shared storage would this be on an xcp-ng host or totally independent of xcp-ng?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post