XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Unable to enable HA on a XCP-ng 8.2.1 Compute Pool

    Scheduled Pinned Locked Moved Compute
    15 Posts 3 Posters 213 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      Denson @Denson
      last edited by

      @Denson I also have a shared NFS SR already configured on the pool.
      heartbeat_SR.png

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Would you have an halted or removed host in your pool?

        D 1 Reply Last reply Reply Quote 0
        • D Offline
          Denson @olivierlambert
          last edited by

          @olivierlambert I added a compute to the pool. It added successfully but HA won't come up.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Check the pool/host view to double check everyone is visible and correctly connected. Same for all shared SR in that pool.

            D 1 Reply Last reply Reply Quote 0
            • D Offline
              Denson @olivierlambert
              last edited by

              @olivierlambert The pool is showing all hosts as connected, same for all the shared SRs. There are even vms running on the new compute host. Only thing not working is enabling HA.
              One of the SRs is showing that HA is configured. Should that be the case when HA is not enabled yet?
              SR.png

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                This only shows which SR will be used with HA, nothing more.

                The issue seems to come from the host "compute-04" when it's trying to enable HA. I would check this host log more in details.

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  Denson @olivierlambert
                  last edited by

                  @olivierlambert Notable error on that compute is that it can't find the UUID of the pool master, for some reason.

                  Mar 13 14:12:00 compute-04 xapi: [error||16119 HTTPS 172.21.17.96->:::80|host.ha_join_liveset R:c44ae729771f|xapi_ha] Failed to find the UUID address of host with address 172.21.18.96
                  Mar 13 14:12:00 compute-04 xapi: [error||16119 :::80||backtrace] host.ha_join_liveset R:c44ae729771f failed with exception Not_found
                  Mar 13 14:12:00 compute-04 xapi: [error||16119 :::80||backtrace] 1/12 xapi Raised at file hashtbl.ml, line 194
                  

                  172.21.17.96 is the management IP of the pool master(compute-01) while 172.21.18.96 is the IP for the same host, used as a Storage interface.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    On this problematic host, can you do a simple command like "xe vm-list"? Does it work?

                    D 1 Reply Last reply Reply Quote 0
                    • tjkreidlT Offline
                      tjkreidl Ambassador @Denson
                      last edited by tjkreidl

                      @Denson Are all hosts properly time synchronized to NTP? Make sure they are all within reasonable limits of each other.
                      Might be a network thing -- are all interfaces configured alike on all hosts? Can the hosts all ping each other?

                      D 1 Reply Last reply Reply Quote 1
                      • D Offline
                        Denson @olivierlambert
                        last edited by

                        @olivierlambert Yes. xe vm-list shows all the running vms on the pool

                        1 Reply Last reply Reply Quote 0
                        • D Offline
                          Denson @tjkreidl
                          last edited by

                          @tjkreidl ntp is synchronized across all the hosts. All the hosts can reach each other and are in a pool already. There are even vms running properly on the said compute. The only thing not working is enabling HA.

                          tjkreidlT 1 Reply Last reply Reply Quote 0
                          • tjkreidlT Offline
                            tjkreidl Ambassador @Denson
                            last edited by tjkreidl

                            @Denson When you enable HA, note that the host has to be rebooted for HA to take effect. Was that the case?

                            D 1 Reply Last reply Reply Quote 0
                            • D Offline
                              Denson @tjkreidl
                              last edited by

                              @tjkreidl which host? Enabling HA on other pools doesn't trigger any host reboot.
                              Enabling HA on this pool results in "Internal_Error(Not_found)"
                              I am aware that a host reboots when you remove it from a compute pool but that is something separate.

                              tjkreidlT 1 Reply Last reply Reply Quote 0
                              • tjkreidlT Offline
                                tjkreidl Ambassador @Denson
                                last edited by tjkreidl

                                @Denson After enabling HA, the host has to be manually rebooted, which I think you're already aware of.
                                OK, well, that leaves pretty much a network issue. Hmm. The statefile created OK on the shared SR within the pool?
                                The "not found" error message doesn't give you much to go on, unfortunately.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post