XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Issue with Two-Node HA Cluster: XAPI Failing to Log In and HA Disablement

    Scheduled Pinned Locked Moved Solved REST API
    6 Posts 2 Posters 508 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      ace_token
      last edited by

      Hi everyone,

      I'm currently facing an issue with my two-node HA cluster. I don't have access to the REST API, and I'm receiving the following error messages:

      Broadcast message from systemd-journald@hv02-xcp-mo (Mon 2024-05-20 18:38:07 CEST):
      
      xapi-nbd[9548]: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds
      
      Broadcast message from systemd-journald@hv02-xcp-mo (Mon 2024-05-20 18:38:07 CEST):
      
      xapi-nbd[9548]: main: Caught unexpected exception: (Failure
      
      Broadcast message from systemd-journald@hv02-xcp-mo (Mon 2024-05-20 18:38:07 CEST):
      
      xapi-nbd[9548]: main:   "Failed to log in via xapi's Unix domain socket in 300.000000 seconds")
      

      Due to this, nothing seems to be working correctly. I am unable to manage the cluster or access any of the services that rely on XAPI.

      Additionally, I am trying to disable HA from the CLI but encounter the following error:

      [18:47 hv02-xcp-mo d4e8e42c-e758-dd47-800e-ce2aaae3abdc]# xe pool-ha-disable
      The server could not join the liveset because the HA daemon could not access the heartbeat disk.
      

      I have the disk on storage, but I can't mount it.

      How can I disable HA from the CLI, even with these problems?

      Please, this is urgent. Any help or guidance would be greatly appreciated!

      Thank you.

      1 Reply Last reply Reply Quote 0
      • A Offline
        ace_token @Danp
        last edited by

        @Danp, glusterfs.

        Anyway I found the problem. It was a problem with cache size and the client fuse didn't mount it.

        Thank you really much for helping me.

        1 Reply Last reply Reply Quote 1
        • A Offline
          ace_token
          last edited by

          from 6de91d50-ad60-416b-a736-9f426f058361-image.png
          May 20 19:01:05 hv02-xcp-mo xapi: [ warn||0 |Checking HA configuration D:6662d56d1422|static_vdis] Attempt to reattach static VDIs via 'attach-static-vdis start' failed: INTERNAL_ERROR: [ Subprocess exited with unexpected code 1; stdout = [ ]; stderr = [ Redirecting to /bin/systemctl start attach-static-vdis.service\x0AJob for attach-static-vdis.service failed because the control process exited with error code. See "systemctl status attach-static-vdis.service" and "journalctl -xe" for details.\x0A ] ]
          May 20 19:01:05 hv02-xcp-mo xapi: [debug||0 |Checking HA configuratio/var/log/xensource.log

          1 Reply Last reply Reply Quote 0
          • DanpD Offline
            Danp Pro Support Team
            last edited by

            You could try running the following command on each host to disable HA on each one --

            xe host-emergency-ha-disable --force

            FWIW, HA requires three nodes to run properly, and not every environment needs HA even if you think you do. 😉

            A 1 Reply Last reply Reply Quote 1
            • A Offline
              ace_token @Danp
              last edited by

              @Danp said in Issue with Two-Node HA Cluster: XAPI Failing to Log In and HA Disablement:

              You could try running the following command on each host to disable HA on each one --

              xe host-emergency-ha-disable --force

              FWIW, HA requires three nodes to run properly, and not every environment needs HA even if you think you do. 😉

              Thank you for your input!

              The XAPI is now functioning again after disable HA. However, I'm encountering an issue while reattaching the storage to each node. Here are the commands and outputs:

              1. List Hosts:

                [21:53 hv01-xcp-mo ~]# xe host-list
                uuid ( RO)                : 0a38ea70-8529-4d30-bf44-8f01e1e4101b
                          name-label ( RW): hv01-xcp-mo
                    name-description ( RW): 
                
                uuid ( RO)                : 67c5d00a-977d-46e2-98ee-0aa620e94db0
                          name-label ( RW): hv02-xcp-mo
                    name-description ( RW): 
                
              2. List PBDs for SR:

                [21:50 hv01-xcp-mo ~]# xe pbd-list sr-uuid=d4e8e42c-e758-dd47-800e-ce2aaae3abdc 
                uuid ( RO)                  : ccc3067f-afc2-a344-028b-3815da0b5afe
                             host-uuid ( RO): 0a38ea70-8529-4d30-bf44-8f01e1e4101b
                               sr-uuid ( RO): d4e8e42c-e758-dd47-800e-ce2aaae3abdc
                         device-config (MRO): backupservers: 10.1.80.11:/san; server: 10.1.80.12:/san
                    currently-attached ( RO): false
                
                uuid ( RO)                  : 280f2c70-96e3-0404-2d20-61789c113356
                             host-uuid ( RO): 67c5d00a-977d-46e2-98ee-0aa620e94db0
                               sr-uuid ( RO): d4e8e42c-e758-dd47-800e-ce2aaae3abdc
                         device-config (MRO): backupservers: 10.1.80.11:/san; server: 10.1.80.12:/san
                    currently-attached ( RO): false
                
              3. Attempt to Plug PBD:

                [21:57 hv01-xcp-mo ~]# xe pbd-plug uuid=ccc3067f-afc2-a344-028b-3815da0b5afe
                Error code: SR_BACKEND_FAILURE_12
                Error parameters: , mount failed with return code 1, 
                

              Can you help me?

              Thanks again for your support!

              1 Reply Last reply Reply Quote 0
              • DanpD Offline
                Danp Pro Support Team
                last edited by Danp

                Storage related errors will be in SMlog

                Edit: What type of storage is this?

                A 1 Reply Last reply Reply Quote 0
                • A Offline
                  ace_token @Danp
                  last edited by

                  @Danp, glusterfs.

                  Anyway I found the problem. It was a problem with cache size and the client fuse didn't mount it.

                  Thank you really much for helping me.

                  1 Reply Last reply Reply Quote 1
                  • olivierlambertO olivierlambert marked this topic as a question on
                  • olivierlambertO olivierlambert has marked this topic as solved on
                  • First post
                    Last post