-
@ronan-a My appoiliges for replying late. The issue happened again and remembered this thread.
I triedcat /sys/kernel/debug/drbd/resources/xcp-volume-{UUID}/volumes/0/openers
and it is empty across all hosts for both the old broken VDI and the new one.
The hosts are:- eva (master)
- phoebe
- mike (linstor controller)
- ozly
I also have scheduled backups snapshots so not sure if this will affect the vdi removal.
Here is the log SMlog.zip.txt The file is not a.txt
it is just a.zip
(the forum doesnt allow.zip
).
The file is filled withbad volume
and idk what to do to fix it. -
Just got this working in my 3 host home setup..... But I'm looking to drop down to two hosts. Is it going to be usable with 2 hosts (I've seen the recommendation of 3+ at the top) and if so, what happens when you get down to 1 host whatever reason??? Are read / writes locked on the remaining host?
-
Hi @jmccoy555
No, it's not meat to run on 2 hosts. A good advice is already to use replication 2 on 3 hosts, this way, even with one host down, it will continue to run. LINSTOR/DRBD is locking everything as soon you have a number of hosts inferior to the target replication.
-
@olivierlambert Thanks. I tried to find a definite answer in the Linstor docs but couldn't.
I had set replication = 2 and could see the diskless copies in the volume list.
I then shutdown one host, and then saw the auto eviction after an hour and overnight it has adjusted so each of the two remaining hosts has a complete replica, i.e. there are no more diskless copies. So it looks like a nice easy to deploy solution (if you need the CPU power of 3 hosts rather than just running them for storage, ££££ these days).
I had assumed too that if I shutdown / lost another host then everything would come to a halt, but there does appear to be some info 'out there' about a 2 node set up and avoiding split brain etc. so I was hopeful it may be possible!
So really the only 2 node option is XOSAN which is kind of not an option by the sounds of it after Gluster going EOL and for playing at home needing to pay or manual install which I don't think there's a guide for (not moaning, you guys give a lot). I guess my aim is to have a 2 host set up, which I can reduce to 1 when I don't need to capacity or to allow the installation of updates without getting shouted at..... 'whys the internet not working' . At present I'm running Ceph hyperconverged in VMs (for VMs and Kubernetes storage........ I know......) across 3 hosts and in reality often run with 2. Yeah I know that if one goes down everything stops, but if I plan to, I just start the third, let it sync and sort itself out and everything is good to do whatever I want. Likewise if something does go wrong, starting the 3rd hosts often gets things moving again whilst I work it out. I really think that I should have lost some data by now by doing something silly, but so far (a few years now) it just sorts it all out. I also rsync everything to TrueNAS every hour and regularly backup the VMs just in case.
I guess I just need to accept that going to two hosts probably means that when moving VMs around their storage needs to go with them, and that TrueNAS (a VM again...... I know, but must be 15+years with no issues) needs to provide the storage for Kubernetes
-
Hey @ronan-a , Now all the VDIs on the device are broken, I tried to migrate them but i get errors such as.
SR_BACKEND_FAILURE_1200(, Cannot update volume uuid 36a23780-2025-4f3f-bade-03c410e63368 to 45537c14-0125-4f6c-a1ad-476552888087: this last one is not empty, )
SR_BACKEND_FAILURE_78(, VDI Creation failed [opterr=error Error: Could not set kv(/volume/9cdc83cc-0fd8-490e-a3af-2ca40c95f398/not-exists:2): ERRO:Exception thrown.], )
SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=Plugin linstor-manager failed], )
I dont care about the broken VDIs content so no worries.
It was fun experimenting with it, but I need more storage and will move the SSDs to my NAS and run my VMs on NFS there instead.
Before I do so I thought you might be interested in debugging the issues and getting my logs if that will help the project. Just let me know what files I need to send and will be happy to do so. -
Hey @AudleyElwine Ronan will take a look next week. It might be a bug we already fixed for our next beta round. He'll tell you
-
Hi @ronan-a ,
So like we said at some point, we're using a K8s cluster that is connecting to the linstor directly. It's actually going surprisingly well, and we've even deployed that in production with contingency plans in case of failure, but it's been rock solid for now.
We're working on setting up Velero to automatically backup all of our K8s cluster metadata along with the PVs for easy Disaster Recovery, but we've hit a unfortunate blocker. Here is what we're getting from Velero when attempting to do the backup/snapshot:
error: message: 'Failed to check and update snapshot content: failed to take snapshot of the volume pvc-3602bca1-5b92-4fc7-96af-ce77f35e802c: "rpc error: code = Internal desc = failed to create snapshot: error creating S3 backup: Message: ''LVM_THIN based backup shipping requires at least version 2.24 for setsid from util_linux'' next error: Message: ''LVM_THIN based backup shipping requires support for thin_send_recv'' next error: Message: ''Backup shipping of resource ''pvc-3602bca1-5b92-4fc7-96af-ce77f35e802c'' cannot be started since there is no node available that supports backup shipping.''"'
It looks like when using thin volumes, we can't actually run a backup. We've checked and the current version of setsid is 2.23.2 on xcp-ng :
[12:57 ovbh-pprod-xen12 ~]# setsid --v setsid from util-linux 2.23.2
We know that updating a package directly is a pretty bad idea, so I'm wondering if you have an idea on what we could do to solve this, or if this will be updated with other XCP-ng updates?
Thanks in advance for you time!
P.S: We're working on a full post on how we went about deploying our full K8s linstor CSI setup for other people if anyone is interested in that.
-
@AudleyElwine I think I fixed this issue recently, it's generally caused by a bad snapshot. After that there is a problem to rename it. I will update the packages, thank you for the report.
-
@Maelstrom96 I'm not sure to have a solution, we use VHDs on the top of the LVM/DRBD layer, so only vhd-util + linstor commands are required on our side to backup/snap data.
-
UPDATE AND IMPORTANT INFO
I am updating the LINSTOR packages on our repositories.
This update fixes many issues, especially regarding the HA.However, this update is not compatible with the LINSTOR SRs already configured, so it is necessary to DELETE the existing SRs before installing this update.
We exceptionally allow ourselves to force a reinstallation during this beta, as long as we haven't officially released a production version.
In theory, this should not happen again.To resume:
1 - Uninstall any existing LINSTOR SR.
2 - Install the new sm package: "sm-2.30.7-1.3.0.linstor.3.xcpng8.2.x86_64" on all used hosts.
3 - Reinstall the LINSTOR SR.Thank you !
-
@ronan-a I've checked the commit history and saw that the breaking change seems to be related to the renaming of the KV store. Also just noticed that you renamed the volume namespace. Is there any other breaking changes that would require the deletion of the SR in order to update the sm package?
I've made a python code that makes a copy of all the old KV data to the new KV name, along with renaming the key names for the volume data and was wondering if that would be sufficient.
Thanks,
-
@Maelstrom96 There are two important changes yes: the renaming of the KV store and the XCP volume namespace. In theory you can copy the data of your old KV store to the new one, it should be enough.
However I prefer to be sure that all those who test the driver use a stable version with a new SR to avoid surprises that I would have forgotten. In practice, if you haven't had any problems, a migration script may suffice.
Also, you must be sure there is no running tasks like snapshots, coalesce, etc. Otherwise you can have trouble during the update.
-
@ronan-a Perfect, thanks a lot for your input
-
Is there a way to remove Linstor / XOSTOR entirely? I've been experimenting a little bit with the latest update and it looks like it takes a really really long time to run VMs on the shared SR created by XOSTOR and mass VM creation (tested with 6 VMs using terraform) also fails with "code":"TOO_MANY_STORAGE_MIGRATES","params":["3"]
-
Or reprovision it? I removed the SR from the XOA but the actual disk still seems to have the filesystem intact. Also I was running the XOSTOR as the HA SR. Maybe that caused it to become really slow and also I had the affinity set to disabled
-
I'm also seeing this in the xcp-rrdd-plugins.log:
Nov 12 03:49:42 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:49:57 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:50:12 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:50:27 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:50:37 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:50:52 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:51:07 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:51:22 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:51:33 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:51:48 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:52:03 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:52:18 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:52:28 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:52:43 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:52:58 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:53:13 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:53:23 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:53:38 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:53:53 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:54:08 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:54:18 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:54:33 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:54:48 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:55:03 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:55:13 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:55:28 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:55:43 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:55:58 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:56:09 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:56:24 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:56:39 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:56:54 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:57:04 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:57:19 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:57:34 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:57:49 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:57:59 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:58:14 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:58:29 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:58:44 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:58:54 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:59:09 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:59:24 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:59:39 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 03:59:49 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:00:04 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:00:20 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:00:35 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:00:45 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:01:00 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:01:15 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:01:30 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:01:40 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds... Nov 12 04:01:55 xcp-ng-node-1 xcp-rrdd-gpumon: [error||0 ||xcp-rrdd-gpumon] Unexpected error (Failure "not enough memory"), sleeping for 10 seconds...
Not sure if its related
-
@TheiLLeniumStudios said in XOSTOR hyperconvergence preview:
Is there a way to remove Linstor / XOSTOR entirely?
If you destroyed the SR, you can remove the linstor packages and force a reinstallation of a stable sm package using
yum
.I've been experimenting a little bit with the latest update and it looks like it takes a really really long time to run VMs on the shared SR created by XOSTOR and mass VM creation (tested with 6 VMs using terraform)
What do you mean? What's the performance problem? When the VM starts ? During execution?
also fails with "code":"TOO_MANY_STORAGE_MIGRATES","params":["3"]
Do you migrate with XO? Because this exception is normally handled by it, and a new migration is restarted a few moment later in this case.
Or reprovision it? I removed the SR from the XOA but the actual disk still seems to have the filesystem intact.
Did you just execute a
xe sr-forget
command on the SR? In this case the volumes are not removed.xe sr-destroy
must be used to remove the volumes. So you can executexe sr-introduce
and thenxe sr-destroy
to clean your hosts.Also I was running the XOSTOR as the HA SR. Maybe that caused it to become really slow and also I had the affinity set to disabled
I'm not sure, (but the HA was buggy before the last update).
I'm also seeing this in the xcp-rrdd-plugins.log:
No relation with LINSTOR here.
-
Hey @ronan-a
I'm trying to remove the SR via XOA with "Remove this SR" button but im getting this error.SR_NOT_EMPTY()
And there is no VM connected to it. So i tried to delete the disks manually in it but I get this error
SR_HAS_NO_PBDS(OpaqueRef:6d7520e0-60fa-4b93-9dfe-aa7ceb3b17d2)
Could you help me how to remove it? I dont care about its content so any way is okay for me since I want to reinstall it after i install the updates.
-
@ronan-a said in XOSTOR hyperconvergence preview:
Did you just execute a xe sr-forget command on the SR? In this case the volumes are not removed. xe sr-destroy must be used to remove the volumes. So you can execute xe sr-introduce and then xe sr-destroy to clean your hosts.
I did it using XO interface. Now it doesn't show up and when I tried your suggestion of running xe sr-introduce, it just creates an "Unknown" SR and doesn't link it to the previous one. Running xe sr-create also doesn't help since that errors out with
LINSTOR SR creation error [opterr=LINSTOR SR must be unique in a pool]
Can you elaborate the steps for reintroducing a lost SR that is backed by linstor? I ran this command:
xe sr-introduce type=linstor name-label=XOSTOR uuid=41ba4c11-8c13-30b3-fcbb-7668a39825a6
The UUID is the original UUID of the XOSTOR SR which I found in my command history. And running xe sr-destroy after the above introduce leads to
The SR has no attached PBDs
. My disks look like this at the moment:NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT drbd1016 147:1016 0 10G 0 disk drbd1014 147:1014 0 10G 0 disk sdb 8:16 0 238.5G 0 disk |-linstor_group-thin_device_tmeta 253:1 0 120M 0 lvm | `-linstor_group-thin_device-tpool 253:3 0 238.2G 0 lvm | |-linstor_group-xcp--persistent--redo--log_00000 253:10 0 260M 0 lvm | | `-drbd1002 147:1002 0 259.7M 0 disk | |-linstor_group-xcp--persistent--database_00000 253:8 0 1G 0 lvm | | `-drbd1000 147:1000 0 1G 0 disk /var/lib/linstor | |-linstor_group-thin_device 253:4 0 238.2G 0 lvm | |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000 253:11 0 10G 0 lvm | | `-drbd1004 147:1004 0 10G 0 disk | |-linstor_group-xcp--persistent--ha--statefile_00000 253:9 0 8M 0 lvm | | `-drbd1001 147:1001 0 8M 0 disk | |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000 253:5 0 20M 0 lvm | | `-drbd1009 147:1009 0 20M 0 disk | `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000 253:12 0 10G 0 lvm | `-drbd1006 147:1006 0 10G 0 disk `-linstor_group-thin_device_tdata 253:2 0 238.2G 0 lvm `-linstor_group-thin_device-tpool 253:3 0 238.2G 0 lvm |-linstor_group-xcp--persistent--redo--log_00000 253:10 0 260M 0 lvm | `-drbd1002 147:1002 0 259.7M 0 disk |-linstor_group-xcp--persistent--database_00000 253:8 0 1G 0 lvm | `-drbd1000 147:1000 0 1G 0 disk /var/lib/linstor |-linstor_group-thin_device 253:4 0 238.2G 0 lvm |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000 253:11 0 10G 0 lvm | `-drbd1004 147:1004 0 10G 0 disk |-linstor_group-xcp--persistent--ha--statefile_00000 253:9 0 8M 0 lvm | `-drbd1001 147:1001 0 8M 0 disk |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000 253:5 0 20M 0 lvm | `-drbd1009 147:1009 0 20M 0 disk `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000 253:12 0 10G 0 lvm `-drbd1006 147:1006 0 10G 0 disk drbd1012 147:1012 0 10G 0 disk tda 254:0 0 10G 0 disk drbd1015 147:1015 0 10G 0 disk drbd1005 147:1005 0 20M 0 disk sda 8:0 0 223.6G 0 disk |-sda4 8:4 0 512M 0 part /boot/efi |-sda2 8:2 0 18G 0 part |-sda5 8:5 0 4G 0 part /var/log |-sda3 8:3 0 182.1G 0 part | `-XSLocalEXT--712c1f83--d11f--ae07--d2b8--14a823761e6e-712c1f83--d11f--ae07--d2b8--14a823761e6e 253:0 0 182.1G 0 lvm /run/sr-mount/712c1f83-d11f-ae07-d2b8-14a823761e6e |-sda1 8:1 0 18G 0 part / `-sda6 8:6 0 1G 0 part [SWAP] tdb 254:1 0 50G 0 disk
@ronan-a said in XOSTOR hyperconvergence preview:
What do you mean? What's the performance problem? When the VM starts ? During execution?
So, with XOSTOR created and making the Pool HA with that SR, whenever I created a new VM in that SR and not choose any affinity host, it takes atleast 10-15 minutes to run a migrate task which shouldn't be necessary because XOSTOR is shared right? Or is my assumption not correct? And once the job is done, it doesn't even start all the VMs and some even disappeared from XO. I was creating the VMs using terraform and it spun up 6 of them at a time and since the migrate task started for all of them, I saw the error TOO_MANY_STORAGE_MIGRATES. Not really sure what's going on.
I first thought it was my template VDI that wasn't on XOSTOR but I reuploaded the cloud image to XOSTOR but still got the same behavior. And I'm not even using spinning disks, 2 of hosts have NVMe drives and 1 of them involved in XOSTOR has an mSATA one
-
@TheiLLeniumStudios Better explanations, to reintroduce a LINSTOR SR, you can use these commands with you own parameters:
Generate a new SR UUID.
[10:18 r620-s1 ~]# uuidgen 345adcd2-aa2b-44ad-9c25-788cf870db72 [10:18 r620-s1 ~]# xe sr-introduce uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 type=linstor name-label="XOSTOR" content-type=user 345adcd2-aa2b-44ad-9c25-788cf870db72 # Get host UUIDs. [10:18 r620-s1 ~]# xe host-list uuid ( RO) : 888254e8-da05-4f86-ad37-979b8d6bad04 name-label ( RW): R620-S2 name-description ( RW): Default install uuid ( RO) : c96ec4dd-28ac-4df4-b73c-4371bd202728 name-label ( RW): R620-S1 name-description ( RW): Default install uuid ( RO) : ddcd3461-7052-4f5e-932c-e1ed75c192d6 name-label ( RW): R620-S3 name-description ( RW): Default install
Create the PBDs using the same old config.
[10:19 r620-s1 ~]# xe pbd-create sr-uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 host-uuid=c96ec4dd-28ac-4df4-b73c-4371bd202728 device-config:hosts=r620-s1,r620-s2,r620-s3 device-config:group-name=linstor_group/thin_device device-config:redundancy=2 device-config:provisioning=thin 1c5c030a-1823-d53a-d8df-6c50af6beb2b [10:19 r620-s1 ~]# xe pbd-create sr-uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 host-uuid=888254e8-da05-4f86-ad37-979b8d6bad04 device-config:hosts=r620-s1,r620-s2,r620-s3 device-config:group-name=linstor_group/thin_device device-config:redundancy=2 device-config:provisioning=thin 4c5df60a-f96d-19c2-44b0-f5951388d502 [10:20 r620-s1 ~]# xe pbd-create sr-uuid=345adcd2-aa2b-44ad-9c25-788cf870db72 host-uuid=ddcd3461-7052-4f5e-932c-e1ed75c192d6 device-config:hosts=r620-s1,r620-s2,r620-s3 device-config:group-name=linstor_group/thin_device device-config:redundancy=2 device-config:provisioning=thin 584d033c-7bad-ebc8-30dd-1888ea2bea29
If you don't know what's your group name, you can use vgs/lvs, in my case I use thin provisioning, so I have an associated volume:
[10:21 r620-s1 ~]# vgs VG #PV #LV #SN Attr VSize VFree linstor_group 1 5 0 wz--n- 931.51g 0 [10:21 r620-s1 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert thin_device linstor_group twi-aotz-- <931.28g 0.29 10.55 ...
I did it using XO interface. Now it doesn't show up and when I tried your suggestion of running xe sr-introduce, it just creates an "Unknown" SR and doesn't link it to the previous one. Running xe sr-create also doesn't help since that errors out with LINSTOR SR creation error [opterr=LINSTOR SR must be unique in a pool]
If you have this error, the LINSTOR PBDs still exist. Are you sure you forgot the previous SR?
So, with XOSTOR created and making the Pool HA with that SR, whenever I created a new VM in that SR and not choose any affinity host, it takes atleast 10-15 minutes to run a migrate task which shouldn't be necessary because XOSTOR is shared right?
Regarding the migration, it's correct if the SR is created with shared=true, and if you migrate the VM between two hosts with the same SR used, the migration should be short.
After repairing your SR, you can do a VM migration and send me the logs of the machines if you want, I can take a look.