Recovery from lost node in HA
-
Hello,
I have a XCP-NG 8.3 pool running 3 hosts with XOSTOR in a 3 replicas with HA enabled.
This setup should permit to lose up to 2 nodes without dataloss
Initial informations:
- The linstor controller was on node 1.
- pool master was node 2
- Satellite are running in all nodes.
I was able to migrate VDI on XOSTOR successfuly ( even if when i start a transfert into xostor, i need to wait ~1 minute before the transfert really start ( i see that in XO ).
In my first tests, i will shut node 3 (which is neither master, not linstor controller )
For my first test, i didn't want to kill the linster controller host / pool master immediately, it should be my second test / third test )
I stopped node 3 ( poweroff from IPMI ).
However, then entire pool was dead.
In
xensource.logof all remaining nodes ( node 1, and node 2 ), i can see:Jul 5 15:32:20 node2 xapi: [debug||0 |Checking HA configuration D:9b97e277d80e|helpers] /usr/libexec/xapi/cluster-stack/xhad/ha_start_daemon exited with code 8 [stdout = ''; stderr = 'Sat Jul 5 15:32:20 CEST 2025 ha_start_daemon: the HA daemon stopped without forming a liveset (8)\x0A'] Jul 5 15:32:20 node2 xapi: [ warn||0 |Checking HA configuration D:9b97e277d80e|xapi_ha] /usr/libexec/xapi/cluster-stack/xhad/ha_start_daemon returned MTC_EXIT_CAN_NOT_ACCESS_STATEFILE (State-File is inaccessible) Jul 5 15:32:20 gco-002-rbx-002 xapi: [ warn||0 |Checking HA configuration D:9b97e277d80e|xapi_ha] ha_start_daemon failed with MTC_EXIT_CAN_NOT_ACCESS_STATEFILE: will contact existing master and check if HA is still enabledHowever, the storage layer was ok
[15:33 node1 linstor-controller]# linstor node list ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ Node ┊ NodeType ┊ Addresses ┊ State ┊ ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ h1 ┊ COMBINED ┊ 192.168.1.1:3366 (PLAIN) ┊ Online ┊ ┊ h2 ┊ COMBINED ┊ 192.168.1.2:3366 (PLAIN) ┊ Online ┊ ┊ h3 ┊ COMBINED ┊ 192.168.1.3:3366 (PLAIN) ┊ OFFLINE (Auto-eviction: 2025-07-05 16:33:42) ┊ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Volumes was also OK using
linstor volume list[15:33 r1 linstor-controller]# linstor volume list ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊ ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ r1 ┊ xcp-persistent-database ┊ xcp-sr-linstor_group_thin_device ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 52.74 MiB ┊ InUse ┊ UpToDate ┊ ┊ r2 ┊ xcp-persistent-database ┊ xcp-sr-linstor_group_thin_device ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 6.99 MiB ┊ Unused ┊ UpToDate ┊ ┊ r3 ┊ xcp-persistent-database ┊ xcp-sr-linstor_group_thin_device ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 6.99 MiB ┊ ┊ Unknown ┊( i didn't put the entire list of volumes, in writing my post, i'm feel a bit stupid to don't save the entire output ).
I finally solved my issue by re-upping node3 which promote itself as master, but i need to perform this test again because the result is not the expected one.
Did i do something wrong ?
-
The issue is the HA cannot write in the statefile
Have you changed the timeout duration for the HA? -
@olivierlambert No,
For once, i followed the installation step carefully ^^'
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login