XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    The HA doesn't work

    Scheduled Pinned Locked Moved XCP-ng
    16 Posts 3 Posters 323 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DanpD Online
      Danp Pro Support Team
      last edited by

      Have you confirmed that HA is enabled on the pool's Advanced tab? Also, did one of the other hosts automatically take the role of pool master?

      S 1 Reply Last reply Reply Quote 0
      • S Offline
        sixela @Danp
        last edited by

        @Danp yesss ha is enable, the master has not changed

        1 Reply Last reply Reply Quote 0
        • DanpD Online
          Danp Pro Support Team
          last edited by

          Can you describe in more detail this dom0 crash? You should investigate in the logs for why HA didn't kick in and promote a new pool master.

          S 2 Replies Last reply Reply Quote 0
          • S Offline
            sixela @Danp
            last edited by

            @Danp Hello,

            I think he crashed due to a hardware problem.

            The ha started but I got the above error so I had to start the server by hand afterwards but the dom0 that had crashed was UP again

            Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

            1 Reply Last reply Reply Quote 0
            • S Offline
              sixela @Danp
              last edited by

              @Danp Hello,

              In addition :

              28 machines impacted, 15 left ok, and 13 with the error msg. the HA did try, but there was a problem.

              DanpD 1 Reply Last reply Reply Quote 0
              • DanpD Online
                Danp Pro Support Team @sixela
                last edited by

                @sixela There's likely more information in the logs that would explain why a new pool master wasn't designated.

                S 1 Reply Last reply Reply Quote 0
                • S Offline
                  sixela @Danp
                  last edited by

                  @Danp I'm not talking about a new master in my problem... but that my vm's all restart with the ha on another host in the same cluster with the restart if possible parameter that it tries but it's still lock with the following error:

                  Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

                  Translated with DeepL.com (free version)

                  DanpD 1 Reply Last reply Reply Quote 0
                  • DanpD Online
                    Danp Pro Support Team @sixela
                    last edited by

                    @sixela I understand. I'm not an expert on HA functionality, but I suspect that the new pool master would need to be designated as a first step. That is why I am suggesting that you investigate why this didn't automatically occur.

                    1 Reply Last reply Reply Quote 0
                    • tjkreidlT Offline
                      tjkreidl Ambassador
                      last edited by

                      How many hosts in your pool? For HA to work out of the box, you need at least three hosts in a pool. Also, are all your hosts properly time synchronized to the same time source?
                      They need to be very close in time to each other for HA to work properly. Note that when HA is first enabled on a given host, it has to be rebooted for HA to function.

                      S 1 Reply Last reply Reply Quote 1
                      • S Offline
                        sixela @tjkreidl
                        last edited by sixela

                        @tjkreidl Hello,

                        We have 17 hosts in the pool and they are well synchronized with a time server 😕

                        tjkreidlT 1 Reply Last reply Reply Quote 0
                        • tjkreidlT Offline
                          tjkreidl Ambassador @sixela
                          last edited by

                          @sixela Hmmm ... that SR backend error makes me wonder if the place where you designate HA info to be stored (the so-called "heartbeat SR") might be corrupted or such?

                          S 1 Reply Last reply Reply Quote 0
                          • S Offline
                            sixela @tjkreidl
                            last edited by

                            @tjkreidl Hello,

                            Didn't I get half of ok too?

                            28 machines impacted, 15 left ok, and 13 with the error msg

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post