XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Transfer Pool Master / only partial success

    Scheduled Pinned Locked Moved Solved Xen Orchestra
    xoamasterdesignate
    6 Posts 2 Posters 552 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • X Offline
      xoahtw
      last edited by xoahtw

      Hi,
      we are currently testing xcp-ng 8.3 as a complement for our vmware clusters to reduce our vmware footprint.

      Environment:
      Our test-environment consists of four servers distributed over four sites. VMs are stored on Netapp NFS-Volumes, all four servers each have two 10G or 25G interfaces bonded to a LAG connected to a pair of serverswitches. Each LAG contains xcp-management VLAN, storage VLAN and production VLANs of the VMs. There are already some Test-VMs on the cluster. All four server have the same patch level. Host 1 is Cluster Master, 2 to 4 are cluster members.

      Problem:
      Currently I have the the scenario where you loose network connection to Host 1 (Master). XOA lost contact to the cluster and I tried to elevate Server 2 to be the new master and used "pool-emergency-transition-to-master" on Server 2 CLI. The command seemed to be executed sucessfully and I was able to access log in in its XO lite-GUI (but did not change something there) and saw the test VMs running. Next I went to XOA-GUI and added Server 2 at Settings / Servers and the 4 cluster hosts reappeared and Server 2 has the Master tag, Hosts 1,3 and 4 are shown as disabled.
      But Master (Host 2) has a grey dot (Status: halted) and Available Buttons in its Advanced Menu are "Disable Maintenance Mode" and "Enable", as if the server itself is in Maintenance Mode and Disabled. The VMs are still running on this server. When I click "Disable Maintenance Mode" oder "Enable", a red popup appears "server still booting".

      So I assume that the transfer of pool master role is not fully completed yet. In some board links I also found commands like:

      • xe pool-designate-new-master host-uuid=<Slave UUID>
      • xe pool-recover-slaves

      But I did not find "offical" guides how to use them. For example for "command pool-designate-new-master"... do I have to use it in my case and do I use it on the new new master server to onboard the other slaves or do I use it on the slaves to send them to the new master?

      The other "surving" host 4 is currently shown as disabled but there is still a test VM running on the server. I assume I have to link Host 4 to new Master Host 2 and there is still something to complete the Master transfer to Host 2?

      D 1 Reply Last reply Reply Quote 1
      • X xoahtw marked this topic as a question on
      • D Offline
        DustinB @xoahtw
        last edited by DustinB

        @xoahtw The CLI input you're looking to use in a scenario like this would be

        pool-emergency-transition-to-master

        The command you used, is for a healthy pool where you haven't lost the existing master.

        Additional details can be found here: https://docs.xcp-ng.org/appendix/cli_reference/

        Edited to reference the correct command and to post the link.

        X 1 Reply Last reply Reply Quote 0
        • X Offline
          xoahtw @DustinB
          last edited by xoahtw

          @DustinB : Hi Dustin,

          pool-emergency-transition-to-master
          

          is what I used, like discribed in my first post... because the Master was not available any more. I wonder if something is still missing.

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            DustinB @xoahtw
            last edited by

            @xoahtw If the host was online and running normally that you ran this command on, while the pool master was offline it should transition over and manage the remainder of the pool.

            Of course you'd have to configure your XO/XOA to use the new master so that you can see and manage your VMs.

            Also sorry, I mis-read your post and the last bulleted lines stuck with me.

            X 1 Reply Last reply Reply Quote 0
            • X Offline
              xoahtw @DustinB
              last edited by xoahtw

              Yes, on XOA I added / changed the Data to the access credentials of Host 2 and the cluster reappeared with Host 2 having the Master tag.
              But Master (Host 2) has a grey dot (Status: halted) and Available Buttons in its Advanced Menu are "Disable Maintenance Mode" and "Enable", as if the server itself is in Maintenance Mode and Disabled. When I click "Disable Maintenance Mode" oder "Enable", a red popup appears "HOST_STILL_BOOTING()". VMs are running on this host.

              Restart Toolstack did not change the behaviour.

              edit: and in Task log I see a flood of "API call: host.isPubKeyTooShort" failed tasks

              X 1 Reply Last reply Reply Quote 0
              • X Offline
                xoahtw @xoahtw
                last edited by xoahtw

                positive Update: today I wanted to continue with this problem, but when I logged Into XOA today, Master status was Green / enabled. 👍

                Since no one execpt me has credentials to the Testcluster, I am sure that no one else had done something. So the master host seemed to recovered itself from grey to green within the last 8 Hours. There were at least 1-2 hours yesterday between execution of "pool-emergency-transition-to-master" and still beeing grey dot / Halted.

                xe pool-emergency-reset-master master-address=... then added the remaining hosts to the new master.

                1 Reply Last reply Reply Quote 0
                • X xoahtw has marked this topic as solved on
                • First post
                  Last post