XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    The HA doesn't work

    Scheduled Pinned Locked Moved XCP-ng
    16 Posts 3 Posters 323 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DanpD Offline
      Danp Pro Support Team
      last edited by

      Can you describe in more detail this dom0 crash? You should investigate in the logs for why HA didn't kick in and promote a new pool master.

      S 2 Replies Last reply Reply Quote 0
      • S Offline
        sixela @Danp
        last edited by

        @Danp Hello,

        I think he crashed due to a hardware problem.

        The ha started but I got the above error so I had to start the server by hand afterwards but the dom0 that had crashed was UP again

        Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

        1 Reply Last reply Reply Quote 0
        • S Offline
          sixela @Danp
          last edited by

          @Danp Hello,

          In addition :

          28 machines impacted, 15 left ok, and 13 with the error msg. the HA did try, but there was a problem.

          DanpD 1 Reply Last reply Reply Quote 0
          • DanpD Offline
            Danp Pro Support Team @sixela
            last edited by

            @sixela There's likely more information in the logs that would explain why a new pool master wasn't designated.

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              sixela @Danp
              last edited by

              @Danp I'm not talking about a new master in my problem... but that my vm's all restart with the ha on another host in the same cluster with the restart if possible parameter that it tries but it's still lock with the following error:

              Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

              Translated with DeepL.com (free version)

              DanpD 1 Reply Last reply Reply Quote 0
              • DanpD Offline
                Danp Pro Support Team @sixela
                last edited by

                @sixela I understand. I'm not an expert on HA functionality, but I suspect that the new pool master would need to be designated as a first step. That is why I am suggesting that you investigate why this didn't automatically occur.

                1 Reply Last reply Reply Quote 0
                • tjkreidlT Offline
                  tjkreidl Ambassador
                  last edited by

                  How many hosts in your pool? For HA to work out of the box, you need at least three hosts in a pool. Also, are all your hosts properly time synchronized to the same time source?
                  They need to be very close in time to each other for HA to work properly. Note that when HA is first enabled on a given host, it has to be rebooted for HA to function.

                  S 1 Reply Last reply Reply Quote 1
                  • S Offline
                    sixela @tjkreidl
                    last edited by sixela

                    @tjkreidl Hello,

                    We have 17 hosts in the pool and they are well synchronized with a time server 😕

                    tjkreidlT 1 Reply Last reply Reply Quote 0
                    • tjkreidlT Offline
                      tjkreidl Ambassador @sixela
                      last edited by

                      @sixela Hmmm ... that SR backend error makes me wonder if the place where you designate HA info to be stored (the so-called "heartbeat SR") might be corrupted or such?

                      S 1 Reply Last reply Reply Quote 0
                      • S Offline
                        sixela @tjkreidl
                        last edited by

                        @tjkreidl Hello,

                        Didn't I get half of ok too?

                        28 machines impacted, 15 left ok, and 13 with the error msg

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post