-
@ronan-a I just tried to reintroduce the SR and I got no errors while running xe pdb-create but it still shows up as a -1 Size SR. I think I might have corrupted the metadata as checking lvs, vgs and pvs throw errors:
[11:09 xcp-ng-node-1 ~]# lvs /dev/drbd1014: open failed: No data available LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 712c1f83-d11f-ae07-d2b8-14a823761e6e XSLocalEXT-712c1f83-d11f-ae07-d2b8-14a823761e6e -wi-ao---- <182.06g thin_device linstor_group twi-aotz-- <238.24g 1.64 11.27 xcp-persistent-database_00000 linstor_group Vwi-aotz-- 1.00g thin_device 0.84 xcp-persistent-ha-statefile_00000 linstor_group Vwi-aotz-- 8.00m thin_device 6.25 xcp-persistent-redo-log_00000 linstor_group Vwi-aotz-- 260.00m thin_device 0.53 xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e_00000 linstor_group Vwi-aotz-- 10.03g thin_device 0.14 xcp-volume-4b70d69b-9cca-4aa3-842f-09366ac76901_00000 linstor_group Vwi-aotz-- 10.03g thin_device 38.67 xcp-volume-70bf80a2-a008-469a-a7db-0ea92fcfc392_00000 linstor_group Vwi-aotz-- 20.00m thin_device 71.88 [11:09 xcp-ng-node-1 ~]# vgs /dev/drbd1014: open failed: No data available VG #PV #LV #SN Attr VSize VFree XSLocalEXT-712c1f83-d11f-ae07-d2b8-14a823761e6e 1 1 0 wz--n- <182.06g 0 linstor_group 1 7 0 wz--n- 238.47g 0 [11:09 xcp-ng-node-1 ~]# pvs /dev/drbd1014: open failed: No data available PV VG Fmt Attr PSize PFree /dev/sda3 XSLocalEXT-712c1f83-d11f-ae07-d2b8-14a823761e6e lvm2 a-- <182.06g 0 /dev/sdb linstor_group lvm2 a-- 238.47g 0 [11:09 xcp-ng-node-1 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT drbd1016 147:1016 0 10G 0 disk drbd1014 147:1014 0 10G 0 disk sdb 8:16 0 238.5G 0 disk |-linstor_group-thin_device_tmeta 253:1 0 120M 0 lvm | `-linstor_group-thin_device-tpool 253:3 0 238.2G 0 lvm | |-linstor_group-xcp--persistent--redo--log_00000 253:10 0 260M 0 lvm | | `-drbd1002 147:1002 0 259.7M 0 disk | |-linstor_group-xcp--persistent--database_00000 253:8 0 1G 0 lvm | | `-drbd1000 147:1000 0 1G 0 disk /var/lib/linstor | |-linstor_group-thin_device 253:4 0 238.2G 0 lvm | |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000 253:11 0 10G 0 lvm | | `-drbd1004 147:1004 0 10G 0 disk | |-linstor_group-xcp--persistent--ha--statefile_00000 253:9 0 8M 0 lvm | | `-drbd1001 147:1001 0 8M 0 disk | |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000 253:5 0 20M 0 lvm | | `-drbd1009 147:1009 0 20M 0 disk | `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000 253:12 0 10G 0 lvm | `-drbd1006 147:1006 0 10G 0 disk `-linstor_group-thin_device_tdata 253:2 0 238.2G 0 lvm `-linstor_group-thin_device-tpool 253:3 0 238.2G 0 lvm |-linstor_group-xcp--persistent--redo--log_00000 253:10 0 260M 0 lvm | `-drbd1002 147:1002 0 259.7M 0 disk |-linstor_group-xcp--persistent--database_00000 253:8 0 1G 0 lvm | `-drbd1000 147:1000 0 1G 0 disk /var/lib/linstor |-linstor_group-thin_device 253:4 0 238.2G 0 lvm |-linstor_group-xcp--volume--13a94a7a--d433--4426--8232--812e3c6dc52e_00000 253:11 0 10G 0 lvm | `-drbd1004 147:1004 0 10G 0 disk |-linstor_group-xcp--persistent--ha--statefile_00000 253:9 0 8M 0 lvm | `-drbd1001 147:1001 0 8M 0 disk |-linstor_group-xcp--volume--70bf80a2--a008--469a--a7db--0ea92fcfc392_00000 253:5 0 20M 0 lvm | `-drbd1009 147:1009 0 20M 0 disk `-linstor_group-xcp--volume--4b70d69b--9cca--4aa3--842f--09366ac76901_00000 253:12 0 10G 0 lvm `-drbd1006 147:1006 0 10G 0 disk drbd1012 147:1012 0 10G 0 disk tda 254:0 0 10G 0 disk drbd1015 147:1015 0 10G 0 disk drbd1005 147:1005 0 20M 0 disk sda 8:0 0 223.6G 0 disk |-sda4 8:4 0 512M 0 part /boot/efi |-sda2 8:2 0 18G 0 part |-sda5 8:5 0 4G 0 part /var/log |-sda3 8:3 0 182.1G 0 part | `-XSLocalEXT--712c1f83--d11f--ae07--d2b8--14a823761e6e-712c1f83--d11f--ae07--d2b8--14a823761e6e 253:0 0 182.1G 0 lvm /run/sr-mount/712c1f83-d11f-ae07-d2b8-14a82376 |-sda1 8:1 0 18G 0 part / `-sda6 8:6 0 1G 0 part [SWAP] tdb 254:1 0 50G 0 disk [11:09 xcp-ng-node-1 ~]#
Is it possible to clean up the partition table and recreate it some other way without having to reinstall xcp-ng on the machines? As using wipefs -a says that the device is in use so I cannot wipe the partitions
-
@AudleyElwine said in XOSTOR hyperconvergence preview:
Ho! Sounds like a bug fixed in the latest beta... In this case, ensure there is no VM running, and download this script:
wget https://gist.githubusercontent.com/Wescoeur/3b5c399b15c4d700b4906f12b51e2591/raw/452acd9ebcd52c62020e796302c681590b37cd3f/gistfile1.txt -O linstor-kv-tool && chmod +x linstor-kv-tool
Find where is the running linstor-controller, so execute this command on each host:
[11:13 r620-s1 ~]# mountpoint /var/lib/linstor /var/lib/linstor is a mountpoint
If it's a mounpoint, you found it. Now, you must execute the script using the local IP of this host, for example:
./linstor-kv-tool --dump-volumes -u 172.16.210.16 -g xcp-sr-linstor_group_thin_device
The group to use is equal to:
<VG_name>_<LV_thin_name>
. Or just<VG_name>
if you don't use thin provisioning.
Note: there was a bug in the previous beta, you must double thexcp-sr-
prefix. (Example:xcp-sr-xcp-sr-linstor_group_thin_device
)So if you have an output using this script with many entries, you can run
--remove-all-volumes
instead of--dump-volumes
. This command should remove the properties in the LINSTOR KV-store. After that you can dump a new time to verify.Now, you can execute a scan on the SR. After that, it's necessary to remove all resource definitions using the
linstor
binary.Get the list using:
linstor --controllers=<CONTROLLER_IP> resource-definition list āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā® ā ResourceName ā Port ā ResourceGroup ā State ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā” ā xcp-persistent-database ā 7000 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-0db304a1-89a2-45df-a39d-7c5c39a87c5f ā 7006 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-6289f306-ab2b-4388-a5a2-a20ba18698f8 ā 7005 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-73b9a396-c67f-48b3-8774-f60f1c2af598 ā 7001 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-a46393ef-428d-4af8-9c0e-30b0108bd21a ā 7003 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-b83db8cf-ea3b-47aa-ad77-89b5cd9a1853 ā 7002 ā xcp-sr-linstor_group_thin_device ā ok ā ā°āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāÆ
Then execute
linstor resource-definition delete <VOLUME>
on each volume. But don't do that on thexcp-persistent-database
, only onxcp-volume-XXX
!Normally after all these steps, you can destroy the SR properly! I think I will write an automated version for later, like
linstor-emergency-destroy
. -
@TheiLLeniumStudios Can you plug the PBDs? If there is no issue here, you can follow the same steps as AudleyElwine.
-
@ronan-a Thank you for the detailed steps.
I get the following output when i dump my volumes using the
xcp-sr-linstor_group_thin_device
group{ "xcp/sr/journal/clone/0fb10e9f-b9ef-4b59-8b31-9330f0785514": "86b1b2af-8f1d-4155-9961-d06bbacbb7aa_0e121812-fcae-4d70-960f-ac440b3927e3", "xcp/sr/journal/clone/14131ee4-2956-47b7-8728-c9790764f71a": "dfb43813-91eb-46b8-9d56-22c8dbb485fc_917177d5-d03b-495c-b2db-fd62d3d25b86", "xcp/sr/journal/clone/45537c14-0125-4f6c-a1ad-476552888087": "36a23780-2025-4f3f-bade-03c410e63368_3e419764-9c8c-4539-9a42-be96f92e5c2a", "xcp/sr/journal/clone/54ec7009-2424-4299-a9ad-fb015600b88c": "af89f0fc-7d5a-4236-b249-8d9408f5fb6d_f32f2e8f-a43f-43f5-824b-f673a5cbd988", "xcp/sr/journal/clone/558220bc-a900-4408-a62e-a71a4bb4fd7b": "d9294359-c395-4bed-ac3a-bf4027c92bd9_0e18bf3d-78f0-4843-9e8f-ee11c6ebbf5a", "xcp/sr/journal/clone/c41e0d47-5c1a-45c3-9404-01f3b5735c0d": "e191eb57-2478-4e3b-be9d-e8eaba8f9efe_41eae673-a280-439b-a4c6-f3afe2390fde", "xcp/sr/journal/relink/50170fa2-2ca9-4218-8217-5c99ac31f10b": "1" }
but the
--remove-all-volumes
does not delete them because they dont start withxcp/volume/
.Also when i placed
xcp-sr-xcp-sr-linstor_group_thin_device
a lot of volumes appeared similar to the following{ "volume/00897d74-53c9-41b4-8f5f-73132e4a9af7/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": null, \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/00897d74-53c9-41b4-8f5f-73132e4a9af7/not-exists": "0", "volume/00897d74-53c9-41b4-8f5f-73132e4a9af7/volume-name": "xcp-volume-2892500d-d80a-4978-aa87-ab2b39ace9e9", "volume/00b0dbb5-2dfa-4fd5-baf4-81065afa2431/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": null, \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/00b0dbb5-2dfa-4fd5-baf4-81065afa2431/not-exists": "0", ... ... ... "volume/fcbcd0dc-8d90-441e-8d03-e435ac417b96/not-exists": "0", "volume/fcbcd0dc-8d90-441e-8d03-e435ac417b96/volume-name": "xcp-volume-f3748b88-1b25-4f18-8f63-4017b09f2ac6", "volume/fce3b2e0-1025-4c94-9473-e71562ca11bd/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": null, \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/fce3b2e0-1025-4c94-9473-e71562ca11bd/not-exists": "0", "volume/fce3b2e0-1025-4c94-9473-e71562ca11bd/volume-name": "xcp-volume-08f1fb0b-d6a3-47eb-893b-6c8b08417726", "volume/fe6bc8fd-4211-4b4a-8ee5-ba55a7641053/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": null, \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/fe6bc8fd-4211-4b4a-8ee5-ba55a7641053/not-exists": "0", "volume/fe6bc8fd-4211-4b4a-8ee5-ba55a7641053/volume-name": "xcp-volume-7a46e0f4-0f61-4a37-b235-1d2bd9eaf033", "volume/fe8dc6e6-a2c6-449a-8858-255a37cc8f98/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": \"\", \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/fe8dc6e6-a2c6-449a-8858-255a37cc8f98/not-exists": "0", "volume/fe8dc6e6-a2c6-449a-8858-255a37cc8f98/volume-name": "xcp-volume-0290c420-9f14-43ae-9af5-fe333b60c7dc", "volume/feadfc8d-5aeb-429c-8335-4530aa24cc86/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": null, \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/feadfc8d-5aeb-429c-8335-4530aa24cc86/not-exists": "0", "volume/feadfc8d-5aeb-429c-8335-4530aa24cc86/volume-name": "xcp-volume-6a37cf38-d6e4-4af3-90d7-84bec3938b20", "xcp/sr/metadata": "{\"name_description\": \"\", \"name_label\": \"XOSTOR\"}" }
and the
--remove-all-volumes
also does not work on them.
I did the following with and without thexcp-sr
prefix and it produced an empty json when specifiying the namesapce as/xcp/volume
to match thestartwith
in the delete thingy../linstor-kv-tool --dump-volumes -u 192.168.0.106 -g xcp-sr-linstor_group_thin_device -n /xcp/volume
What do you think I should do?
-
@AudleyElwine said in XOSTOR hyperconvergence preview:
but the --remove-all-volumes does not delete them because they dont start with xcp/volume/.
Right, another problem already fixed, but I forgot to put an adapted version on my gist, sorry, you can modify the script to use
volume/
instead ofxcp/volume
. -
@ronan-a Thank you for your fast support.
I did these changes
diff -u linstor-kv-tool linstor-kv-tool-modified --- linstor-kv-tool 2022-11-17 18:57:00.941259380 +0800 +++ linstor-kv-tool-modified 2022-11-17 19:04:15.957504667 +0800 @@ -33,7 +33,7 @@ kv = linstor.KV( group_name, uri=controller_uri, - namespace='/xcp/volume/{}'.format(vdi_name) + namespace='/volume/{}'.format(vdi_name) ) for key, value in list(kv.items()): @@ -46,11 +46,11 @@ uri=controller_uri, namespace='/' ) - for key, value in list(kv.items()): - if key.startswith('xcp/volume/'): + if key.startswith('volume/'): size = key.rindex('/') kv.namespace = key[:size] + print("key is {}".format(repr(key[size + 1:]))) del kv[key[size + 1:]]
and I got the following error.
./linstor-kv-tool-modified --remove-all-volumes -u 192.168.0.106 -g xcp-sr-xcp-sr-linstor_group_thin_device key is u'metadata' Traceback (most recent call last): File "./linstor-kv-tool-modified", line 78, in <module> main() File "./linstor-kv-tool-modified", line 74, in main remove_all_volumes(args.uri, args.group_name) File "./linstor-kv-tool-modified", line 54, in remove_all_volumes del kv[key[size + 1:]] File "/usr/lib/python2.7/site-packages/linstor/kv.py", line 151, in __delitem__ self._del_linstor_kv(k) File "/usr/lib/python2.7/site-packages/linstor/kv.py", line 89, in _del_linstor_kv raise linstor.LinstorError('Could not delete kv({}): {}'.format(k, rs[0])) linstor.errors.LinstorError: Error: Could not delete kv(/volume/aec2104e-e501-4d7d-b0fb-95a80e843e0a/metadata): ERRO:Exception thrown.
and I can confirm the volume exist when I dump all of them
"volume/aec2104e-e501-4d7d-b0fb-95a80e843e0a/metadata": "{\"read_only\": true, \"snapshot_time\": \"\", \"vdi_type\": \"vhd\", \"snapshot_of\": \"\", \"name_label\": \"base copy\", \"name_description\": \"\", \"type\": \"user\", \"metadata_of_pool\": \"\", \"is_a_snapshot\": false}", "volume/aec2104e-e501-4d7d-b0fb-95a80e843e0a/not-exists": "0", "volume/aec2104e-e501-4d7d-b0fb-95a80e843e0a/volume-name": "xcp-volume-b1748285-7cda-429f-b230-50dfba161e9c",
May I ask what do you recommend me to do? And thank you for your continues support.
-
@AudleyElwine said in XOSTOR hyperconvergence preview:
Really strange... Maybe there is a lock or another issue with LINSTOR. In the worst case you can retry after a reboot of all hosts. If it's always stuck I can take a look using a support tunnel, I'm not sure to understand why you have this error. -
@ronan-a I started updating xcp-ng so it can both restart and update on my four nodes (eva, phoebe, mike, ozly).
The nodes were updated with the rolling method, and all three node updated fine, except the forth (mike) (different that the ones that refuses to connect the PBD(eva)) since it is task was stuck at 0.000 progress for 3 hours, so i restarted the toolstack for it(mike) but it didnt do anything, so i restarted the master(eva) node stack. Then when I went to manually update it from XOA, it gives me this error.-1(global name 'commmand' is not defined, , Traceback (most recent call last): File "/etc/xapi.d/plugins/xcpngutils/__init__.py", line 101, in wrapper return func(*args, **kwds) File "/etc/xapi.d/plugins/updater.py", line 96, in decorator return func(*args, **kwargs) File "/etc/xapi.d/plugins/updater.py", line 157, in update raise error NameError: global name 'commmand' is not defined )
The good news is, the linstor controller have moved to a different node(phoebe) from the old one(mike) and I was able to delete all volumes in the
linstor --controllers=... resource-definition list
except for the database, yet the PBD(eva) could not be connected. And the XOA still shows me a lot of disk, and when I scan it I get this errorSR_HAS_NO_PBDS
.So now mike server cant update, and eva server cant connect its PBDs while all the other servers are connected. Note eva was the server that I started my linstor installation on.
Do you have any thoughts on what I can do to fix this without reinstalling xcp-ng on mike?
-
Figured out the issue, when I tried to update it from the cli instead. the
/var/log
partition was full due to/var/log/linstor-controller
having something like 3.5G+ data (90% of the/var/log
volume). maybe it is due to the past errors it accumulated. I deleted these logs and mike updated normally.Now regarding plugging the PBD to eva (the one host that is not connecting to it). it says the following error.
Error code: SR_BACKEND_FAILURE_202 Error parameters: , General backend error [opterr=Base copy 36a23780-2025-4f3f-bade-03c410e63368 not present, but no original 45537c14-0125-4f6c-a1ad-476552888087 found],
this is what linstor resource-definition is showing
[03:59 eva ~]# linstor --controllers=192.168.0.108 resource-definition list -p +---------------------------------------------------------------------------+ | ResourceName | Port | ResourceGroup | State | |===========================================================================| | xcp-persistent-database | 7000 | xcp-sr-linstor_group_thin_device | ok | +---------------------------------------------------------------------------+
And here is the KV store for linstor from that script
[04:01 phoebe ~]# mountpoint /var/lib/linstor /var/lib/linstor is a mountpoint [04:01 phoebe ~]# ./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-xcp-sr-linstor_group_thin_device { "xcp/sr/metadata": "{\"name_description\": \"\", \"name_label\": \"XOSTOR\"}" } [04:01 phoebe ~]# ./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-linstor_group_thin_device { "xcp/sr/journal/clone/0fb10e9f-b9ef-4b59-8b31-9330f0785514": "86b1b2af-8f1d-4155-9961-d06bbacbb7aa_0e121812-fcae-4d70-960f-ac440b3927e3", "xcp/sr/journal/clone/14131ee4-2956-47b7-8728-c9790764f71a": "dfb43813-91eb-46b8-9d56-22c8dbb485fc_917177d5-d03b-495c-b2db-fd62d3d25b86", "xcp/sr/journal/clone/45537c14-0125-4f6c-a1ad-476552888087": "36a23780-2025-4f3f-bade-03c410e63368_3e419764-9c8c-4539-9a42-be96f92e5c2a", "xcp/sr/journal/clone/54ec7009-2424-4299-a9ad-fb015600b88c": "af89f0fc-7d5a-4236-b249-8d9408f5fb6d_f32f2e8f-a43f-43f5-824b-f673a5cbd988", "xcp/sr/journal/clone/558220bc-a900-4408-a62e-a71a4bb4fd7b": "d9294359-c395-4bed-ac3a-bf4027c92bd9_0e18bf3d-78f0-4843-9e8f-ee11c6ebbf5a", "xcp/sr/journal/clone/c41e0d47-5c1a-45c3-9404-01f3b5735c0d": "e191eb57-2478-4e3b-be9d-e8eaba8f9efe_41eae673-a280-439b-a4c6-f3afe2390fde", "xcp/sr/journal/relink/50170fa2-2ca9-4218-8217-5c99ac31f10b": "1" }
I destroyed the PBD and then recreated it to make it just connect so I can destroy the SR, but the same error happened when I tried to connect with the new PBD that has the same config as the other PBD
-
those two UUIDs are in the
./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-linstor_group_thin_device
device output./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-linstor_group_thin_device { "xcp/sr/journal/clone/0fb10e9f-b9ef-4b59-8b31-9330f0785514": "86b1b2af-8f1d-4155-9961-d06bbacbb7aa_0e121812-fcae-4d70-960f-ac440b3927e3", "xcp/sr/journal/clone/14131ee4-2956-47b7-8728-c9790764f71a": "dfb43813-91eb-46b8-9d56-22c8dbb485fc_917177d5-d03b-495c-b2db-fd62d3d25b86", "xcp/sr/journal/clone/45537c14-0125-4f6c-a1ad-476552888087": "36a23780-2025-4f3f-bade-03c410e63368_3e419764-9c8c-4539-9a42-be96f92e5c2a", "xcp/sr/journal/clone/54ec7009-2424-4299-a9ad-fb015600b88c": "af89f0fc-7d5a-4236-b249-8d9408f5fb6d_f32f2e8f-a43f-43f5-824b-f673a5cbd988", "xcp/sr/journal/clone/558220bc-a900-4408-a62e-a71a4bb4fd7b": "d9294359-c395-4bed-ac3a-bf4027c92bd9_0e18bf3d-78f0-4843-9e8f-ee11c6ebbf5a", "xcp/sr/journal/clone/c41e0d47-5c1a-45c3-9404-01f3b5735c0d": "e191eb57-2478-4e3b-be9d-e8eaba8f9efe_41eae673-a280-439b-a4c6-f3afe2390fde", "xcp/sr/journal/relink/50170fa2-2ca9-4218-8217-5c99ac31f10b": "1" }
So I basically deleted all of the keys here, Maybe I should not have done that, but when I did, eva plugged in correctly to the SR and I was able to finally destroying the SR from XOA. So yeah happy ending. Will try the next beta version. Thank you @ronan-a for your work.
-
@ronan-a I tried following the guide that you posted to remove the linstor volumes manually but the resource-definition list command already showed a bunch of resources in a "DELETING" state.
[22:24 xcp-ng-node-1 ~]# linstor --controllers=192.168.10.211 resource-definition list āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā® ā ResourceName ā Port ā ResourceGroup ā State ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā” ā xcp-persistent-database ā 7000 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-persistent-ha-statefile ā 7001 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-persistent-redo-log ā 7002 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e ā 7004 ā xcp-sr-linstor_group_thin_device ā DELETING ā ā xcp-volume-4b70d69b-9cca-4aa3-842f-09366ac76901 ā 7006 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-50aa2e9f-caf0-4b0d-82f3-35893987e53b ā 7010 ā xcp-sr-linstor_group_thin_device ā DELETING ā ā xcp-volume-55c5c3fb-6782-46d6-8a81-f4a5f7cca691 ā 7012 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-5ebca692-6a61-47ec-8cac-e4fa0b6cc38a ā 7016 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-668bcb64-1150-43ac-baaa-db7b92331506 ā 7014 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-6f5235da-8f01-4057-a172-5e68bcb3f423 ā 7007 ā xcp-sr-linstor_group_thin_device ā DELETING ā ā xcp-volume-70bf80a2-a008-469a-a7db-0ea92fcfc392 ā 7009 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-92d4d363-ef03-4d3c-9d47-bef5cb1ca181 ā 7015 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-9a413b51-2625-407a-b05c-62bff025b947 ā 7005 ā xcp-sr-linstor_group_thin_device ā ok ā ā xcp-volume-a02d160d-34fc-4fd6-957d-c7f3f9206ae2 ā 7008 ā xcp-sr-linstor_group_thin_device ā DELETING ā ā xcp-volume-ed04ffda-b379-4be7-8935-4f534f969a3f ā 7003 ā xcp-sr-linstor_group_thin_device ā DELETING ā ā°āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāÆ
Executing resource-definition delete has no impact on them. I just get the following output:
[22:24 xcp-ng-node-1 ~]# linstor resource-definition delete xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e SUCCESS: Description: Resource definition 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' marked for deletion. Details: Resource definition 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' UUID is: 52aceda9-b19b-461a-a119-f62931ba1af9 WARNING: Description: No active connection to satellite 'xcp-ng-node-3' Details: The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update. SUCCESS: Resource 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' on 'xcp-ng-node-1' deleted SUCCESS: Resource 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' on 'xcp-ng-node-2' deleted
I can confirm that node-1 can reach node-3 which it is complaining about for some reason. And I can see node-3 in XO as well and can run VMs on them.
-
@TheiLLeniumStudios In this case, if DRBD is completely stuck, you can reboot your hosts. There is probably a lock or processes that have a lock on them.
-
@AudleyElwine Thank you for your feedbacks, I will update the script to handle the journal cases.
-
@ronan-a 2 of the nodes broke after restarting. I just kept getting the blinking cursor at the top left of the screen for hours. I'm going to have to reprovision all the nodes again sadly
-
Hey @ronan-a ,
What should I do to lower the chance of something in the past installation of xostor to affect my new installation?
lsblk
is still showing the linstor volumes,vgs
is also showinglinstor_group
.
Will awipefs -af
be enough? Or is the "Destroying SR" button in XOA is enough? -
@AudleyElwine The PVs/VGs are kept after a
SR.destroy
call but it's totally safe to reuse them for a new installation. The content of/var/lib/linstor
is not removed after a destroy call, but it's normally not used because the linstor database is shared between hosts using a DRBD volume and mounted in this directory by the running controller. So you don't have manual steps to execute here.Of course if you want to reuse your disks for another thing,
wipefs
is nice for that. -
@TheiLLeniumStudios There are always a solution to repair nodes, what's the output of
linstor --controllers=<HOST_IPS> node list
?(Use a comma separated values for HOST_IPS.)
-
We just hit a weird issue that we managed to fix, but wasn't really clear at first what was wrong, and might be a good idea for you guys to add some type of healthcheck / error handling to catch this and fix it.
What happened was that for some unknown reason, our
/var/lib/linstor
mount (xcp-persistent-database) became read only, but everything kinda kept working-ish, but some stuff would randomly fail, like attempting to delete a ressource. Upon looking at the logs, saw this :Error message: The database is read only; SQL statement: UPDATE SEC_ACL_MAP SET ACCESS_TYPE = ? WHERE OBJECT_PATH = ? AND ROLE_NAME = ? [90097-197]
We did a quick write test on the mount
/var/lib/linstor
and saw that it was indeed in RO mode. We also noticed that the last update time on the db file was 2 days ago.Unmounting the mount and remounting it had the controller start back again, but the first time, some nodes were missing from the node list, so we restarted the linstor-controller service again and everything is now up and healthy.
-
I think we have a relatively important number of updates coming, fixing various potential bugs Stay tuned!
-
@olivierlambert any eta for the next beta release? Looking forward to test it.