@olivierlambert our opnsense resets the TCP states so the firewall block packet because it forgot about the tcp session.
And then, a timeout occured in the middle of the export.
CTO & Associate @Gladhost
@olivierlambert our opnsense resets the TCP states so the firewall block packet because it forgot about the tcp session.
And then, a timeout occured in the middle of the export.
Hello @olivierlambert
I confirm my issue came from my Firewall so, not related to XO.
However, it could be great to make logs more "clear", i mean:
Error: read ETIMEDOUT"
Become
Error: read ETIMEDOUT while connect to X.X.X.X:ABC
That would permit to understand more quickly my "real and weird" issue
Best regards,
Hey @Danp ,
I can keep my cluster without this node up to Monday, do you have an idea / do you want to investigate on this case ?
Hello, @Danp
I just reproduced it.
All my nodes are up to date
( my xo have 7 commits behind master, i read the commit's log of my missing commit, the commits are not related to my issue. ).
In my scenario, i had a pool with 3 nodes
I reinstall node3 after a disaster ( i force forgot node 3).
Now, i can't add the host back to the pool.
I also tried to update the host after installation ( usually, i do this after ).
But it doesn't works more.
I will not add my node thought xcpng center to permit you further investigation
Here is the detailed operation json
{
"id": "0mcwehjzy",
"properties": {
"method": "pool.mergeInto",
"params": {
"sources": [
"64365465-fd4e-25b6-3db2-2cdcd9edba5e"
],
"target": "a92ca4ca-caac-83b9-fa00-4bb75cb48f6c",
"force": true
},
"name": "API call: pool.mergeInto",
"userId": "63a0dbaf-ba2d-4028-b80f-fe49f56686b2",
"type": "api.call"
},
"start": 1752092249470,
"status": "failure",
"updatedAt": 1752092249473,
"end": 1752092249472,
"result": {
"message": "app.getLicenses is not a function",
"name": "TypeError",
"stack": "TypeError: app.getLicenses is not a function\n at enforceHostsHaveLicense (file:///etc/xen-orchestra/packages/xo-server/src/xo-mixins/pool.mjs:15:30)\n at Pools.apply (file:///etc/xen-orchestra/packages/xo-server/src/xo-mixins/pool.mjs:80:13)\n at Pools.mergeInto (/etc/xen-orchestra/node_modules/golike-defer/src/index.js:85:19)\n at Xo.mergeInto (file:///etc/xen-orchestra/packages/xo-server/src/api/pool.mjs:311:15)\n at Task.runInside (/etc/xen-orchestra/@vates/task/index.js:175:22)\n at Task.run (/etc/xen-orchestra/@vates/task/index.js:159:20)\n at Api.#callApiMethod (file:///etc/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:469:18)"
}
}```
@olivierlambert No,
For once, i followed the installation step carefully ^^'
Hello,
I have a XCP-NG 8.3 pool running 3 hosts with XOSTOR in a 3 replicas with HA enabled.
This setup should permit to lose up to 2 nodes without dataloss
Initial informations:
I was able to migrate VDI on XOSTOR successfuly ( even if when i start a transfert into xostor, i need to wait ~1 minute before the transfert really start ( i see that in XO ).
In my first tests, i will shut node 3 (which is neither master, not linstor controller )
For my first test, i didn't want to kill the linster controller host / pool master immediately, it should be my second test / third test )
I stopped node 3 ( poweroff from IPMI ).
However, then entire pool was dead.
In xensource.log
of all remaining nodes ( node 1, and node 2 ), i can see:
Jul 5 15:32:20 node2 xapi: [debug||0 |Checking HA configuration D:9b97e277d80e|helpers] /usr/libexec/xapi/cluster-stack/xhad/ha_start_daemon exited with code 8 [stdout = ''; stderr = 'Sat Jul 5 15:32:20 CEST 2025 ha_start_daemon: the HA daemon stopped without forming a liveset (8)\x0A']
Jul 5 15:32:20 node2 xapi: [ warn||0 |Checking HA configuration D:9b97e277d80e|xapi_ha] /usr/libexec/xapi/cluster-stack/xhad/ha_start_daemon returned MTC_EXIT_CAN_NOT_ACCESS_STATEFILE (State-File is inaccessible)
Jul 5 15:32:20 gco-002-rbx-002 xapi: [ warn||0 |Checking HA configuration D:9b97e277d80e|xapi_ha] ha_start_daemon failed with MTC_EXIT_CAN_NOT_ACCESS_STATEFILE: will contact existing master and check if HA is still enabled
However, the storage layer was ok
[15:33 node1 linstor-controller]# linstor node list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ h1 ┊ COMBINED ┊ 192.168.1.1:3366 (PLAIN) ┊ Online ┊
┊ h2 ┊ COMBINED ┊ 192.168.1.2:3366 (PLAIN) ┊ Online ┊
┊ h3 ┊ COMBINED ┊ 192.168.1.3:3366 (PLAIN) ┊ OFFLINE (Auto-eviction: 2025-07-05 16:33:42) ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Volumes was also OK using linstor volume list
[15:33 r1 linstor-controller]# linstor volume list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ r1 ┊ xcp-persistent-database ┊ xcp-sr-linstor_group_thin_device ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 52.74 MiB ┊ InUse ┊ UpToDate ┊
┊ r2 ┊ xcp-persistent-database ┊ xcp-sr-linstor_group_thin_device ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 6.99 MiB ┊ Unused ┊ UpToDate ┊
┊ r3 ┊ xcp-persistent-database ┊ xcp-sr-linstor_group_thin_device ┊ 0 ┊ 1000 ┊ /dev/drbd1000 ┊ 6.99 MiB ┊ ┊ Unknown ┊
( i didn't put the entire list of volumes, in writing my post, i'm feel a bit stupid to don't save the entire output ).
I finally solved my issue by re-upping node3 which promote itself as master, but i need to perform this test again because the result is not the expected one.
Did i do something wrong ?
Hello, @nikade
Yeah i think like this already, but i'm surprised by the result of the calcul.
I should i have either: 3 nodes which can die, or "1", but i should not have the same number right ?
in my case, i run the command in a 3 hosts cluster and it return me 3
as value. which disturb me.
Best regards,
Hello @fred974
Did you find your answer ?
I don't understand clearly how a 4 nodes pool can tolerate 4 nodes dead.
Hello,
I plan to install my XOSTOR cluster on a pool of 7 nodes with 3 replicas, but not all nodes at once because disks are in use.
consider:
with 2 disks on each
I emptied node 6 & 7.
so, here is what i plan to do:
Run the install script on node 6 & 7 to add their disks
so:
node6# install.sh --disks /dev/sdb
node7# install.sh --disks /dev/sdb
Then, configure the SR and the linstor plugin manager as the following
xe sr-create \
type=linstor name-label=pool-01 \
host-uuid=XXXX \
device-config:group-name=linstor_group/thin_device device-config:redundancy=3 shared=true device-config:provisioning=thin
Normally, i should have a linstor cluster running of 2 nodes ( 2 satellite and one controller randomly placed ) with only 2 disks and then, only 2/3 working replicas.
The cluster SHOULD be usable ( i'm right on this point ? )
The next step, would be to move VM from node 5 on it to evacuate node 5. and then add it to the cluster by the following
node5# install.sh --disks /dev/sdb
node5# xe host-call-plugin \
host-uuid=node5-uuid \
plugin=linstor-manager \
fn=addHost args:groupName=linstor_group/thin_device
That should deploy satelite on node 5 and add the disk.
I normally should have 3/3 working replicas and can start to deploy others nodes progressively.
I'm right on the process ?
aS mentionned in the discord, i will post my feedbacks and results from my setup once i finalized it. ( maybe thought a blog post somewhere ).
Thanks to provide xostor in opensource, it's clearly the missing piece for this virtualization stack in opensource ( vs proxmox )
Hello, @TheNorthernLight
XOStor is now available in 8.3 since it's LTS
@olivierlambert our opnsense resets the TCP states so the firewall block packet because it forgot about the tcp session.
And then, a timeout occured in the middle of the export.