Hello @ronan-a
I will reproduce the case, i will re-destroy one hypervisor and retrigger the case.
Thank you @ronan-a et @olivierlambert
If you need me to tests some special case don't hesit, we have a pool dedicated for this
CTO & Associate @Gladhost
Hello @ronan-a
I will reproduce the case, i will re-destroy one hypervisor and retrigger the case.
Thank you @ronan-a et @olivierlambert
If you need me to tests some special case don't hesit, we have a pool dedicated for this
Hello, @ronan-a
I will reinstall my hypervisor this week.
I will reproduce it and then, resend you the logs.
Bonne journée,
Hello, @DustinB
The https://vates.tech/xostor/ says:
The maximum size of any single Virtual Disk Image (VDI) will always be limited by the smallest disk in your cluster.
But in this case, maybe it can be stored in the "2TB disks" ? Maybe others can answer, i didn't test it.
This test permit to cover the following scenario:
Impact:
Expected results:
We didn't tests other filesystem than XFS for Linux based operating system because we use only XFS.
[hdevigne@VM1 ~]$ htop^C
[hdevigne@VM1 ~]$ echo "coucou" > test
-bash: test: Input/output error
[hdevigne@VM1 ~]$ dmesg
-bash: /usr/bin/dmesg: Input/output error
[hdevigne@VM1 ~]$ d^C
[hdevigne@VM1 ~]$ sudo -i
-bash: sudo: command not found
[hdevigne@VM1 ~]$ dm^C
[hdevigne@VM1 ~]$ sudo -i
-bash: sudo: command not found
[hdevigne@VM1 ~]$ dmesg
-bash: /usr/bin/dmesg: Input/output error
[hdevigne@VM1 ~]$ mount
-bash: mount: command not found
[hdevigne@VM1 ~]$ sud o-i
-bash: sud: command not found
[hdevigne@VM1 ~]$ sudo -i
As we predicted it, the vm is completly fucked-up
Windows VM crash and reboot in loop.
Linstor controller was on node 1, so we will not be able to see linstor nodes status, but we supposed they are in "disconnected" and in "pending eviction", but that doesn't matter a lot, disks are in read only, vm are fucked up after writing, it was our expected bevahior.
Re-plug node 1 and node 2.
Windows boot normally
Linux VM stays in a "broken state"
➜ ~ ssh VM1
suConnection closed by UNKNOWN port 65535
We didn't test a duration up to the eviction states of linstor nodes, but the documentation show that a linstor node restore
would works ( see https://docs.xcp-ng.org/xostor/#what-to-do-when-a-node-is-in-an-evicted-state )
We didn't use HA at this time in the cluster, that could helped a bit in the recovery process. but in a precedent experience that i didn't "historize" like this one, the HA was completely down because it was not able to mount a file, i will probably write another topic on the forum to bring my results public.
Having HA change the criticity of the following note.
Thanks to @olivierlambert, @ronan and other people on the discord canal for answering to daily question which permit to this kind of tests to be made. As promissed, i put my result online
Thanks for XOSTOR.
Futher tests to do: Retry with HA
@olivierlambert our opnsense resets the TCP states so the firewall block packet because it forgot about the tcp session.
And then, a timeout occured in the middle of the export.
Hello @olivierlambert
I confirm my issue came from my Firewall so, not related to XO.
However, it could be great to make logs more "clear", i mean:
Error: read ETIMEDOUT"
Become
Error: read ETIMEDOUT while connect to X.X.X.X:ABC
That would permit to understand more quickly my "real and weird" issue
Best regards,
Hello, @DustinB
The https://vates.tech/xostor/ says:
The maximum size of any single Virtual Disk Image (VDI) will always be limited by the smallest disk in your cluster.
But in this case, maybe it can be stored in the "2TB disks" ? Maybe others can answer, i didn't test it.
hello @DustinB.
Yes you right, i would perform this to be able to have VDI with more than 1TB disk. ( which will not be possible because my smallest disk is 1TB (so, 879GB )...
This test permit to cover the following scenario:
Impact:
Expected results:
We didn't tests other filesystem than XFS for Linux based operating system because we use only XFS.
[hdevigne@VM1 ~]$ htop^C
[hdevigne@VM1 ~]$ echo "coucou" > test
-bash: test: Input/output error
[hdevigne@VM1 ~]$ dmesg
-bash: /usr/bin/dmesg: Input/output error
[hdevigne@VM1 ~]$ d^C
[hdevigne@VM1 ~]$ sudo -i
-bash: sudo: command not found
[hdevigne@VM1 ~]$ dm^C
[hdevigne@VM1 ~]$ sudo -i
-bash: sudo: command not found
[hdevigne@VM1 ~]$ dmesg
-bash: /usr/bin/dmesg: Input/output error
[hdevigne@VM1 ~]$ mount
-bash: mount: command not found
[hdevigne@VM1 ~]$ sud o-i
-bash: sud: command not found
[hdevigne@VM1 ~]$ sudo -i
As we predicted it, the vm is completly fucked-up
Windows VM crash and reboot in loop.
Linstor controller was on node 1, so we will not be able to see linstor nodes status, but we supposed they are in "disconnected" and in "pending eviction", but that doesn't matter a lot, disks are in read only, vm are fucked up after writing, it was our expected bevahior.
Re-plug node 1 and node 2.
Windows boot normally
Linux VM stays in a "broken state"
➜ ~ ssh VM1
suConnection closed by UNKNOWN port 65535
We didn't test a duration up to the eviction states of linstor nodes, but the documentation show that a linstor node restore
would works ( see https://docs.xcp-ng.org/xostor/#what-to-do-when-a-node-is-in-an-evicted-state )
We didn't use HA at this time in the cluster, that could helped a bit in the recovery process. but in a precedent experience that i didn't "historize" like this one, the HA was completely down because it was not able to mount a file, i will probably write another topic on the forum to bring my results public.
Having HA change the criticity of the following note.
Thanks to @olivierlambert, @ronan and other people on the discord canal for answering to daily question which permit to this kind of tests to be made. As promissed, i put my result online
Thanks for XOSTOR.
Futher tests to do: Retry with HA
Hello,
We tried the compression feature.
You "can see" a benefit only if you have a shared storage. ( and again, the migration between 2 nodes is already very fast, we don't see major difference, but maybe a VM will a lot of ram ( >32GB ) can see a difference.
If you don't have a shared storage ( like XOSTOR, NFS, ISCSI ), then you will not see any difference because there is a limitation of 30MB/s-40MB/s ( see here: https://xcp-ng.org/forum/topic/9389/backup-migration-performance )
Best regards,
Hello,
From my test, the result is having multiple xostor is not possible at this time. it's blocked.
( i didn't save the precise error message, but the error was clear: cannot have more than one XOSTOR in the pool ).
Hello, @burbilog
I think theses backup doesn't come from your backup plan.
Is there maybe snapshots on the VM ? 2022 is "very old"
Your full backups include memory which is not "classical" in a incremental backups.
Hello,
I tried on a new pool.
a little different scenario since i don't create xostor for now, on my previous example, i tried to add a node as replacement of an existing one..
I just run the install script only on node 1.
When i try make node2 join the pool, i reproduce the incompatible sm error i got previously.
The things which is "bizarre", is i don't have the license issue i got on Xen-orchestra. ( maybe it was finally not related ? )
Here is the complete logs.
pool.mergeInto
{
"sources": [
"17510fe0-db23-9414-f3df-2941bd34f8dc"
],
"target": "cc91fcdc-c7a8-a44c-65b3-a76dced49252",
"force": true
}
{
"code": "POOL_JOINING_SM_FEATURES_INCOMPATIBLE",
"params": [
"OpaqueRef:090b8da1-9654-066c-84f9-7ab15cb101fd",
""
],
"call": {
"duration": 1061,
"method": "pool.join_force",
"params": [
"* session id *",
"<MASTER_IP>",
"root",
"* obfuscated *"
]
},
"message": "POOL_JOINING_SM_FEATURES_INCOMPATIBLE(OpaqueRef:090b8da1-9654-066c-84f9-7ab15cb101fd, )",
"name": "XapiError",
"stack": "XapiError: POOL_JOINING_SM_FEATURES_INCOMPATIBLE(OpaqueRef:090b8da1-9654-066c-84f9-7ab15cb101fd, )
at Function.wrap (file:///etc/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)
at file:///etc/xen-orchestra/packages/xen-api/transports/json-rpc.mjs:38:21
at runNextTicks (node:internal/process/task_queues:60:5)
at processImmediate (node:internal/timers:454:9)
at process.callbackTrampoline (node:internal/async_hooks:130:17)"
}```
Hello,
we have nodes with multiple disk group.
Does someone experienced multiple xostor on different local group ?
In our case
It should works easily because we gonna register 2 xostor, but maybe should we put all disks in the same VG ? but in this case, the max vdi size will be lower ?
I purpose to edit the install script to provide group name selection to permit this use case easily.
It's ok ?
Hello @ronan-a
Just to be sure, you want logs of the node which want to join or the master ?
Bonne journée
Hello @ronan-a
I will reproduce the case, i will re-destroy one hypervisor and retrigger the case.
Thank you @ronan-a et @olivierlambert
If you need me to tests some special case don't hesit, we have a pool dedicated for this