Issue with Two-Node HA Cluster: XAPI Failing to Log In and HA Disablement
-
Hi everyone,
I'm currently facing an issue with my two-node HA cluster. I don't have access to the REST API, and I'm receiving the following error messages:
Broadcast message from systemd-journald@hv02-xcp-mo (Mon 2024-05-20 18:38:07 CEST): xapi-nbd[9548]: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds Broadcast message from systemd-journald@hv02-xcp-mo (Mon 2024-05-20 18:38:07 CEST): xapi-nbd[9548]: main: Caught unexpected exception: (Failure Broadcast message from systemd-journald@hv02-xcp-mo (Mon 2024-05-20 18:38:07 CEST): xapi-nbd[9548]: main: "Failed to log in via xapi's Unix domain socket in 300.000000 seconds")
Due to this, nothing seems to be working correctly. I am unable to manage the cluster or access any of the services that rely on XAPI.
Additionally, I am trying to disable HA from the CLI but encounter the following error:
[18:47 hv02-xcp-mo d4e8e42c-e758-dd47-800e-ce2aaae3abdc]# xe pool-ha-disable The server could not join the liveset because the HA daemon could not access the heartbeat disk.
I have the disk on storage, but I can't mount it.
How can I disable HA from the CLI, even with these problems?
Please, this is urgent. Any help or guidance would be greatly appreciated!
Thank you.
-
@Danp, glusterfs.
Anyway I found the problem. It was a problem with cache size and the client fuse didn't mount it.
Thank you really much for helping me.
-
from
May 20 19:01:05 hv02-xcp-mo xapi: [ warn||0 |Checking HA configuration D:6662d56d1422|static_vdis] Attempt to reattach static VDIs via 'attach-static-vdis start' failed: INTERNAL_ERROR: [ Subprocess exited with unexpected code 1; stdout = [ ]; stderr = [ Redirecting to /bin/systemctl start attach-static-vdis.service\x0AJob for attach-static-vdis.service failed because the control process exited with error code. See "systemctl status attach-static-vdis.service" and "journalctl -xe" for details.\x0A ] ]
May 20 19:01:05 hv02-xcp-mo xapi: [debug||0 |Checking HA configuratio/var/log/xensource.log -
You could try running the following command on each host to disable HA on each one --
xe host-emergency-ha-disable --force
FWIW, HA requires three nodes to run properly, and not every environment needs HA even if you think you do.
-
@Danp said in Issue with Two-Node HA Cluster: XAPI Failing to Log In and HA Disablement:
You could try running the following command on each host to disable HA on each one --
xe host-emergency-ha-disable --force
FWIW, HA requires three nodes to run properly, and not every environment needs HA even if you think you do.
Thank you for your input!
The XAPI is now functioning again after disable HA. However, I'm encountering an issue while reattaching the storage to each node. Here are the commands and outputs:
-
List Hosts:
[21:53 hv01-xcp-mo ~]# xe host-list uuid ( RO) : 0a38ea70-8529-4d30-bf44-8f01e1e4101b name-label ( RW): hv01-xcp-mo name-description ( RW): uuid ( RO) : 67c5d00a-977d-46e2-98ee-0aa620e94db0 name-label ( RW): hv02-xcp-mo name-description ( RW):
-
List PBDs for SR:
[21:50 hv01-xcp-mo ~]# xe pbd-list sr-uuid=d4e8e42c-e758-dd47-800e-ce2aaae3abdc uuid ( RO) : ccc3067f-afc2-a344-028b-3815da0b5afe host-uuid ( RO): 0a38ea70-8529-4d30-bf44-8f01e1e4101b sr-uuid ( RO): d4e8e42c-e758-dd47-800e-ce2aaae3abdc device-config (MRO): backupservers: 10.1.80.11:/san; server: 10.1.80.12:/san currently-attached ( RO): false uuid ( RO) : 280f2c70-96e3-0404-2d20-61789c113356 host-uuid ( RO): 67c5d00a-977d-46e2-98ee-0aa620e94db0 sr-uuid ( RO): d4e8e42c-e758-dd47-800e-ce2aaae3abdc device-config (MRO): backupservers: 10.1.80.11:/san; server: 10.1.80.12:/san currently-attached ( RO): false
-
Attempt to Plug PBD:
[21:57 hv01-xcp-mo ~]# xe pbd-plug uuid=ccc3067f-afc2-a344-028b-3815da0b5afe Error code: SR_BACKEND_FAILURE_12 Error parameters: , mount failed with return code 1,
Can you help me?
Thanks again for your support!
-
-
Storage related errors will be in SMlog
Edit: What type of storage is this?
-
@Danp, glusterfs.
Anyway I found the problem. It was a problem with cache size and the client fuse didn't mount it.
Thank you really much for helping me.
-
-