XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    The HA doesn't work

    Scheduled Pinned Locked Moved XCP-ng
    16 Posts 3 Posters 323 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DanpD Online
      Danp Pro Support Team @sixela
      last edited by

      Hi @sixela

      Is the VDI sitting on shared or local storage?

      Dan

      S 2 Replies Last reply Reply Quote 0
      • S Offline
        sixela @Danp
        last edited by

        @Danp shared with multiple host (LUN) iscsi

        1 Reply Last reply Reply Quote 0
        • S Offline
          sixela @Danp
          last edited by

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • DanpD Online
            Danp Pro Support Team
            last edited by

            Have you confirmed that HA is enabled on the pool's Advanced tab? Also, did one of the other hosts automatically take the role of pool master?

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              sixela @Danp
              last edited by

              @Danp yesss ha is enable, the master has not changed

              1 Reply Last reply Reply Quote 0
              • DanpD Online
                Danp Pro Support Team
                last edited by

                Can you describe in more detail this dom0 crash? You should investigate in the logs for why HA didn't kick in and promote a new pool master.

                S 2 Replies Last reply Reply Quote 0
                • S Offline
                  sixela @Danp
                  last edited by

                  @Danp Hello,

                  I think he crashed due to a hardware problem.

                  The ha started but I got the above error so I had to start the server by hand afterwards but the dom0 that had crashed was UP again

                  Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

                  1 Reply Last reply Reply Quote 0
                  • S Offline
                    sixela @Danp
                    last edited by

                    @Danp Hello,

                    In addition :

                    28 machines impacted, 15 left ok, and 13 with the error msg. the HA did try, but there was a problem.

                    DanpD 1 Reply Last reply Reply Quote 0
                    • DanpD Online
                      Danp Pro Support Team @sixela
                      last edited by

                      @sixela There's likely more information in the logs that would explain why a new pool master wasn't designated.

                      S 1 Reply Last reply Reply Quote 0
                      • S Offline
                        sixela @Danp
                        last edited by

                        @Danp I'm not talking about a new master in my problem... but that my vm's all restart with the ha on another host in the same cluster with the restart if possible parameter that it tries but it's still lock with the following error:

                        Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

                        Translated with DeepL.com (free version)

                        DanpD 1 Reply Last reply Reply Quote 0
                        • DanpD Online
                          Danp Pro Support Team @sixela
                          last edited by

                          @sixela I understand. I'm not an expert on HA functionality, but I suspect that the new pool master would need to be designated as a first step. That is why I am suggesting that you investigate why this didn't automatically occur.

                          1 Reply Last reply Reply Quote 0
                          • tjkreidlT Offline
                            tjkreidl Ambassador
                            last edited by

                            How many hosts in your pool? For HA to work out of the box, you need at least three hosts in a pool. Also, are all your hosts properly time synchronized to the same time source?
                            They need to be very close in time to each other for HA to work properly. Note that when HA is first enabled on a given host, it has to be rebooted for HA to function.

                            S 1 Reply Last reply Reply Quote 1
                            • S Offline
                              sixela @tjkreidl
                              last edited by sixela

                              @tjkreidl Hello,

                              We have 17 hosts in the pool and they are well synchronized with a time server 😕

                              tjkreidlT 1 Reply Last reply Reply Quote 0
                              • tjkreidlT Offline
                                tjkreidl Ambassador @sixela
                                last edited by

                                @sixela Hmmm ... that SR backend error makes me wonder if the place where you designate HA info to be stored (the so-called "heartbeat SR") might be corrupted or such?

                                S 1 Reply Last reply Reply Quote 0
                                • S Offline
                                  sixela @tjkreidl
                                  last edited by

                                  @tjkreidl Hello,

                                  Didn't I get half of ok too?

                                  28 machines impacted, 15 left ok, and 13 with the error msg

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post