XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Unable to enable High Availability - INTERNAL_ERROR(Not_found)

    Scheduled Pinned Locked Moved XCP-ng
    33 Posts 6 Posters 439 Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • tjkreidlT Offline
      tjkreidl Ambassador
      last edited by

      Note also that if HA is turned on or off, the host must be restarted for that change to take effect, if I recall correctly.

      J 1 Reply Last reply Reply Quote 0
      • J Offline
        jmannik @tjkreidl
        last edited by jmannik

        @tjkreidl This hasn't been my experience so far, enabling HA has just enabled HA, no reboot needed.

        @psafont I am patching all my hosts now, will do the above test packages on Sunday Night (it is Friday afternoon at the time of this post)

        nikadeN 1 Reply Last reply Reply Quote 0
        • nikadeN Offline
          nikade Top contributor @jmannik
          last edited by

          @jmannik said in Unable to enable High Availability - INTERNAL_ERROR(Not_found):

          @tjkreidl This hasn't been my experience so far, enabling HA has just enabled HA, no reboot needed.

          @psafont I am patching all my hosts now, will do the above test packages on Sunday Night (it is Friday afternoon at the time of this post)

          Correct, no reboot needed to enable/disable HA.

          tjkreidlT 1 Reply Last reply Reply Quote 0
          • tjkreidlT Offline
            tjkreidl Ambassador @nikade
            last edited by

            @nikade Interesting, as that at some point used to be the case, at least with XenServer!
            I stand corrected and learned something new.

            1 Reply Last reply Reply Quote 0
            • J Offline
              jmannik @psafont
              last edited by

              @psafont
              That is done now, tried to enable HA again and it was unsuccessful, what would you like me to do now?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                You should check the /var/log/xensource.log, it should provide a more explicit error message

                tjkreidlT 1 Reply Last reply Reply Quote 0
                • tjkreidlT Offline
                  tjkreidl Ambassador @olivierlambert
                  last edited by

                  @olivierlambert Good idea. Also, they should make sure all hosts are at the same update/patch levels, the network is set up properly among the three or more hosts, there is a compatible HA shared storage properly set up, etc.
                  You folks have a good guide at: https://docs.xcp-ng.org/management/ha/

                  1 Reply Last reply Reply Quote 0
                  • J Offline
                    jmannik
                    last edited by olivierlambert

                    Well this is what im getting now:

                    {
                      "id": "0mhbgkupy",
                      "properties": {
                        "method": "pool.enableHa",
                        "params": {
                          "pool": "213186d2-e3ba-154f-d371-4122388deb83",
                          "heartbeatSrs": [
                            "381caeb2-5ad9-8924-365d-4b130c67c064"
                          ],
                          "configuration": {}
                        },
                        "name": "API call: pool.enableHa",
                        "userId": "71d48027-d471-4b01-83f9-830df4279f7e",
                        "type": "api.call"
                      },
                      "start": 1761709884550,
                      "status": "failure",
                      "updatedAt": 1761709923544,
                      "end": 1761709923544,
                      "result": {
                        "code": "INTERNAL_ERROR",
                        "params": [
                          "unable to gather the coordinator's UUID: Not_found"
                        ],
                        "call": {
                          "duration": 38993,
                          "method": "pool.enable_ha",
                          "params": [
                            "* session id *",
                            [
                              "OpaqueRef:a83a416f-c97d-1ed8-c7fc-213af89b8f86"
                            ],
                            {}
                          ]
                        },
                        "message": "INTERNAL_ERROR(unable to gather the coordinator's UUID: Not_found)",
                        "name": "XapiError",
                        "stack": "XapiError: INTERNAL_ERROR(unable to gather the coordinator's UUID: Not_found)\n    at Function.wrap (file:///opt/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)\n    at file:///opt/xen-orchestra/packages/xen-api/transports/json-rpc.mjs:38:21\n    at runNextTicks (node:internal/process/task_queues:65:5)\n    at processImmediate (node:internal/timers:453:9)\n    at process.callbackTrampoline (node:internal/async_hooks:130:17)"
                      }
                    }
                    
                    psafontP 1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      That's better 🙂 @psafont now we now we are missing an UUID somewhere?

                      1 Reply Last reply Reply Quote 0
                      • psafontP Offline
                        psafont Vates 🪐 XAPI & Network Team @jmannik
                        last edited by psafont

                        @jmannik
                        So the problem goes like this:

                        • HA uses a local-only database to avoid depending on the database
                        • This database contains a mapping from UUID to the IP host_address for all hosts in an HA cluster / pool. This information should be gathered right before HA is enabled, from the normal database.
                        • When trying to enable HA, the host fetches the coordinator's address from the filesystem. Then it uses the previous mapping and the coordinator address to find the coordinator's UUID. This step fails.

                        I'm not sure what has actually happening, but some scenarios come to mind:

                        • XO isn't calling the API function Host.preconfigure_ha, which means the local database is not created (unlikely)
                        • The coordinator's address has somehow changed between the local database being written and the HA being enabled

                        things to check out:

                        • inspect the values that the failing host has about the host_address of the coordinator / master host, both on:
                          1. the normal database. You can SSH into the failing host and run the following command, replacinf POOL_UUID with the actual uuid, this can be done deleting POOL_UUID , placing the cursor after the = and pressing tab twice.
                        xe pool-param-get uuid=POOL_UUID param-name=master | xargs -I _ xe host-param-get uuid=_ param-name=address
                        
                        1. and the pool role file, similar to the previous command, SSH in the failing host and run
                        cat /etc/xensource/pool.conf
                        

                        Let us know how it goes. If the IPs don't match, there's a problem with the configuration of the member, and otherwise it's because the local database is outdated and should be refreshed before enabling HA. I don't know how XO handles it.

                        J 1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          @psafont I'm not sure to follow, I don't remember seeing any documented endpoint related to prepare HA 🤔

                          psafontP 1 Reply Last reply Reply Quote 0
                          • psafontP Offline
                            psafont Vates 🪐 XAPI & Network Team @olivierlambert
                            last edited by

                            @olivierlambert The call is indeed hidden from the docs, and only callable from inside a pool... it's called as part as Pool.enable_ha

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              So we probably need to tell XO team the "right way" to enable HA because there's no way to know from "outside" 😓

                              psafontP 1 Reply Last reply Reply Quote 0
                              • psafontP Offline
                                psafont Vates 🪐 XAPI & Network Team @olivierlambert
                                last edited by

                                @olivierlambert

                                So we probably need to tell XO team the "right way" to enable HA because there's no way to know from "outside it's not meant to, xapi makes the call automatically.

                                I don't think so, it's xapi's responsibility to make that call

                                1 Reply Last reply Reply Quote 1
                                • J Offline
                                  jmannik @psafont
                                  last edited by

                                  @psafont
                                  [22:13 vmhost13 ~]# xe pool-param-get uuid=213186d2-e3ba-154f-d371-4122388deb83 param-name=master | xargs -I _ xe host-param-get uuid=_ param-name=address
                                  192.168.10.13
                                  [22:13 vmhost13 ~]# cat /etc/xensource/pool.conf
                                  master[22:14 vmhost13 ~]#

                                  psafontP J 2 Replies Last reply Reply Quote 0
                                  • psafontP Offline
                                    psafont Vates 🪐 XAPI & Network Team @jmannik
                                    last edited by

                                    @jmannik Could you collect the file contents of /etc/xensource/pool.conf from all the other hosts? The command is failing in one of them, not on the master host.

                                    1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      jmannik @jmannik
                                      last edited by olivierlambert

                                      [22:27 vmhost12 ~]# xe pool-param-get uuid=213186d2-e3ba-154f-d371-4122388deb83 param-name=master | xargs -I _ xe host-param-get uuid=_ param-name=address
                                      192.168.10.13
                                      [22:27 vmhost12 ~]# cat /etc/xensource/pool.conf
                                      slave:192.168.30.13[22:27 vmhost12 ~]#
                                      
                                      [22:27 vmhost11 ~]# xe pool-param-get uuid=213186d2-e3ba-154f-d371-4122388deb83  param-name=master | xargs -I _ xe host-param-get uuid=_ param-name=address
                                      192.168.10.13
                                      [22:28 vmhost11 ~]# cat /etc/xensource/pool.conf
                                      slave:192.168.30.13[22:28 vmhost11 ~]#
                                      

                                      I think I see where the issue is, not sure how to solve it though

                                      psafontP J 2 Replies Last reply Reply Quote 0
                                      • psafontP Offline
                                        psafont Vates 🪐 XAPI & Network Team @jmannik
                                        last edited by

                                        @jmannik The IPs match, and now I don't have an explanation on why is this happening, I'll take another look at the codepath, but that'll have to take a while, as work is piling up

                                        J 1 Reply Last reply Reply Quote 0
                                        • J Offline
                                          jmannik @jmannik
                                          last edited by

                                          Ok, so in this process I have come across a re-occurring issue I have had with XCP-NG where it will have the wrong order for the ethernet interfaces.
                                          Each of my hosts has a 1gbit interface onboard, then a 4 port 10gbit card
                                          It SHOULD be ordering the interfaces like so:
                                          ETH0 1gbit
                                          ETH1 10gbit
                                          ETH2 10gbit
                                          ETH3 10gbit
                                          ETH4 10gbit

                                          But it will randomly decide upon install (VMHost11 was recently rebuilt due to an id10t pebkac issue) to order them like below for no apparent reason:

                                          ETH0 10gbit
                                          ETH1 1gbit
                                          ETH2 10gbit
                                          ETH3 10gbit
                                          ETH4 10gbit

                                          And to be able to re-order the interfaces its just a lot more difficult that I think it should be.

                                          1 Reply Last reply Reply Quote 0
                                          • J Offline
                                            jmannik @psafont
                                            last edited by

                                            @psafont said in Unable to enable High Availability - INTERNAL_ERROR(Not_found):

                                            @jmannik The IPs match, and now I don't have an explanation on why is this happening, I'll take another look at the codepath, but that'll have to take a while, as work is piling up

                                            Ahh but they dont match.
                                            VMHost13 lists 192.168.10.13
                                            VMHost12 and VMHost11 list 192.168.30.13

                                            psafontP 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post