XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    The HA doesn't work

    Scheduled Pinned Locked Moved XCP-ng
    16 Posts 3 Posters 323 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      sixela
      last edited by

      Hello,

      We had a DOM0 crash last night and in the HA I have the machines restarted if possible but the vms have not restarted, here are the errors:

      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] sm_exec D:7818195f2572 failed with exception Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]);S()]]])
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]);S()]]])
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 1/8 xapi Raised at file ocaml/xapi/sm_exec.ml, line 377
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 2/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 3/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 4/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 95
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 5/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 121
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 6/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 7/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 8/8 xapi Called from file lib/backtrace.ml, line 177
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace]
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] VDI.activate D:7781cbf53aa1 failed with exception Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]);S()]]])
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]);S()]]])
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] 1/1 xapi Raised at file (Thread 14746299 has no backtrace table. Was with_backtraces called?, line 0
      Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace]
      

      I have the impression that the vdi remains locked on DOM0 until it restarts.

      Because around 00:25 I rebooted by hand and it works.

      I would have preferred the HA to do what was necessary.

      Thanks for your help

      Have a nice day.

      DanpD 1 Reply Last reply Reply Quote 0
      • DanpD Offline
        Danp Pro Support Team @sixela
        last edited by

        Hi @sixela

        Is the VDI sitting on shared or local storage?

        Dan

        S 2 Replies Last reply Reply Quote 0
        • S Offline
          sixela @Danp
          last edited by

          @Danp shared with multiple host (LUN) iscsi

          1 Reply Last reply Reply Quote 0
          • S Offline
            sixela @Danp
            last edited by

            This post is deleted!
            1 Reply Last reply Reply Quote 0
            • DanpD Offline
              Danp Pro Support Team
              last edited by

              Have you confirmed that HA is enabled on the pool's Advanced tab? Also, did one of the other hosts automatically take the role of pool master?

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                sixela @Danp
                last edited by

                @Danp yesss ha is enable, the master has not changed

                1 Reply Last reply Reply Quote 0
                • DanpD Offline
                  Danp Pro Support Team
                  last edited by

                  Can you describe in more detail this dom0 crash? You should investigate in the logs for why HA didn't kick in and promote a new pool master.

                  S 2 Replies Last reply Reply Quote 0
                  • S Offline
                    sixela @Danp
                    last edited by

                    @Danp Hello,

                    I think he crashed due to a hardware problem.

                    The ha started but I got the above error so I had to start the server by hand afterwards but the dom0 that had crashed was UP again

                    Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

                    1 Reply Last reply Reply Quote 0
                    • S Offline
                      sixela @Danp
                      last edited by

                      @Danp Hello,

                      In addition :

                      28 machines impacted, 15 left ok, and 13 with the error msg. the HA did try, but there was a problem.

                      DanpD 1 Reply Last reply Reply Quote 0
                      • DanpD Offline
                        Danp Pro Support Team @sixela
                        last edited by

                        @sixela There's likely more information in the logs that would explain why a new pool master wasn't designated.

                        S 1 Reply Last reply Reply Quote 0
                        • S Offline
                          sixela @Danp
                          last edited by

                          @Danp I'm not talking about a new master in my problem... but that my vm's all restart with the ha on another host in the same cluster with the restart if possible parameter that it tries but it's still lock with the following error:

                          Feb 27 00:03:43DOM0 xapi: [error||14746299 ||backtrace] Raised Storage_error ([S(Backend_error);[S(SR_BACKEND_FAILURE_46);[S();S(The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:e67d5aed-ae13-497e-ac16-29882c317ef3']]]);S()]])

                          Translated with DeepL.com (free version)

                          DanpD 1 Reply Last reply Reply Quote 0
                          • DanpD Offline
                            Danp Pro Support Team @sixela
                            last edited by

                            @sixela I understand. I'm not an expert on HA functionality, but I suspect that the new pool master would need to be designated as a first step. That is why I am suggesting that you investigate why this didn't automatically occur.

                            1 Reply Last reply Reply Quote 0
                            • tjkreidlT Offline
                              tjkreidl Ambassador
                              last edited by

                              How many hosts in your pool? For HA to work out of the box, you need at least three hosts in a pool. Also, are all your hosts properly time synchronized to the same time source?
                              They need to be very close in time to each other for HA to work properly. Note that when HA is first enabled on a given host, it has to be rebooted for HA to function.

                              S 1 Reply Last reply Reply Quote 1
                              • S Offline
                                sixela @tjkreidl
                                last edited by sixela

                                @tjkreidl Hello,

                                We have 17 hosts in the pool and they are well synchronized with a time server 😕

                                tjkreidlT 1 Reply Last reply Reply Quote 0
                                • tjkreidlT Offline
                                  tjkreidl Ambassador @sixela
                                  last edited by

                                  @sixela Hmmm ... that SR backend error makes me wonder if the place where you designate HA info to be stored (the so-called "heartbeat SR") might be corrupted or such?

                                  S 1 Reply Last reply Reply Quote 0
                                  • S Offline
                                    sixela @tjkreidl
                                    last edited by

                                    @tjkreidl Hello,

                                    Didn't I get half of ok too?

                                    28 machines impacted, 15 left ok, and 13 with the error msg

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post