-
Figured out the issue, when I tried to update it from the cli instead. the
/var/log
partition was full due to/var/log/linstor-controller
having something like 3.5G+ data (90% of the/var/log
volume). maybe it is due to the past errors it accumulated. I deleted these logs and mike updated normally.Now regarding plugging the PBD to eva (the one host that is not connecting to it). it says the following error.
Error code: SR_BACKEND_FAILURE_202 Error parameters: , General backend error [opterr=Base copy 36a23780-2025-4f3f-bade-03c410e63368 not present, but no original 45537c14-0125-4f6c-a1ad-476552888087 found],
this is what linstor resource-definition is showing
[03:59 eva ~]# linstor --controllers=192.168.0.108 resource-definition list -p +---------------------------------------------------------------------------+ | ResourceName | Port | ResourceGroup | State | |===========================================================================| | xcp-persistent-database | 7000 | xcp-sr-linstor_group_thin_device | ok | +---------------------------------------------------------------------------+
And here is the KV store for linstor from that script
[04:01 phoebe ~]# mountpoint /var/lib/linstor /var/lib/linstor is a mountpoint [04:01 phoebe ~]# ./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-xcp-sr-linstor_group_thin_device { "xcp/sr/metadata": "{\"name_description\": \"\", \"name_label\": \"XOSTOR\"}" } [04:01 phoebe ~]# ./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-linstor_group_thin_device { "xcp/sr/journal/clone/0fb10e9f-b9ef-4b59-8b31-9330f0785514": "86b1b2af-8f1d-4155-9961-d06bbacbb7aa_0e121812-fcae-4d70-960f-ac440b3927e3", "xcp/sr/journal/clone/14131ee4-2956-47b7-8728-c9790764f71a": "dfb43813-91eb-46b8-9d56-22c8dbb485fc_917177d5-d03b-495c-b2db-fd62d3d25b86", "xcp/sr/journal/clone/45537c14-0125-4f6c-a1ad-476552888087": "36a23780-2025-4f3f-bade-03c410e63368_3e419764-9c8c-4539-9a42-be96f92e5c2a", "xcp/sr/journal/clone/54ec7009-2424-4299-a9ad-fb015600b88c": "af89f0fc-7d5a-4236-b249-8d9408f5fb6d_f32f2e8f-a43f-43f5-824b-f673a5cbd988", "xcp/sr/journal/clone/558220bc-a900-4408-a62e-a71a4bb4fd7b": "d9294359-c395-4bed-ac3a-bf4027c92bd9_0e18bf3d-78f0-4843-9e8f-ee11c6ebbf5a", "xcp/sr/journal/clone/c41e0d47-5c1a-45c3-9404-01f3b5735c0d": "e191eb57-2478-4e3b-be9d-e8eaba8f9efe_41eae673-a280-439b-a4c6-f3afe2390fde", "xcp/sr/journal/relink/50170fa2-2ca9-4218-8217-5c99ac31f10b": "1" }
I destroyed the PBD and then recreated it to make it just connect so I can destroy the SR, but the same error happened when I tried to connect with the new PBD that has the same config as the other PBD
-
those two UUIDs are in the
./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-linstor_group_thin_device
device output./linstor-kv-tool-modified --dump-volumes -u 192.168.0.108 -g xcp-sr-linstor_group_thin_device { "xcp/sr/journal/clone/0fb10e9f-b9ef-4b59-8b31-9330f0785514": "86b1b2af-8f1d-4155-9961-d06bbacbb7aa_0e121812-fcae-4d70-960f-ac440b3927e3", "xcp/sr/journal/clone/14131ee4-2956-47b7-8728-c9790764f71a": "dfb43813-91eb-46b8-9d56-22c8dbb485fc_917177d5-d03b-495c-b2db-fd62d3d25b86", "xcp/sr/journal/clone/45537c14-0125-4f6c-a1ad-476552888087": "36a23780-2025-4f3f-bade-03c410e63368_3e419764-9c8c-4539-9a42-be96f92e5c2a", "xcp/sr/journal/clone/54ec7009-2424-4299-a9ad-fb015600b88c": "af89f0fc-7d5a-4236-b249-8d9408f5fb6d_f32f2e8f-a43f-43f5-824b-f673a5cbd988", "xcp/sr/journal/clone/558220bc-a900-4408-a62e-a71a4bb4fd7b": "d9294359-c395-4bed-ac3a-bf4027c92bd9_0e18bf3d-78f0-4843-9e8f-ee11c6ebbf5a", "xcp/sr/journal/clone/c41e0d47-5c1a-45c3-9404-01f3b5735c0d": "e191eb57-2478-4e3b-be9d-e8eaba8f9efe_41eae673-a280-439b-a4c6-f3afe2390fde", "xcp/sr/journal/relink/50170fa2-2ca9-4218-8217-5c99ac31f10b": "1" }
So I basically deleted all of the keys here, Maybe I should not have done that, but when I did, eva plugged in correctly to the SR and I was able to finally destroying the SR from XOA. So yeah happy ending. Will try the next beta version. Thank you @ronan-a for your work.
-
@ronan-a I tried following the guide that you posted to remove the linstor volumes manually but the resource-definition list command already showed a bunch of resources in a "DELETING" state.
[22:24 xcp-ng-node-1 ~]# linstor --controllers=192.168.10.211 resource-definition list ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ xcp-persistent-database ┊ 7000 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-persistent-ha-statefile ┊ 7001 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-persistent-redo-log ┊ 7002 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e ┊ 7004 ┊ xcp-sr-linstor_group_thin_device ┊ DELETING ┊ ┊ xcp-volume-4b70d69b-9cca-4aa3-842f-09366ac76901 ┊ 7006 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-50aa2e9f-caf0-4b0d-82f3-35893987e53b ┊ 7010 ┊ xcp-sr-linstor_group_thin_device ┊ DELETING ┊ ┊ xcp-volume-55c5c3fb-6782-46d6-8a81-f4a5f7cca691 ┊ 7012 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-5ebca692-6a61-47ec-8cac-e4fa0b6cc38a ┊ 7016 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-668bcb64-1150-43ac-baaa-db7b92331506 ┊ 7014 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-6f5235da-8f01-4057-a172-5e68bcb3f423 ┊ 7007 ┊ xcp-sr-linstor_group_thin_device ┊ DELETING ┊ ┊ xcp-volume-70bf80a2-a008-469a-a7db-0ea92fcfc392 ┊ 7009 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-92d4d363-ef03-4d3c-9d47-bef5cb1ca181 ┊ 7015 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-9a413b51-2625-407a-b05c-62bff025b947 ┊ 7005 ┊ xcp-sr-linstor_group_thin_device ┊ ok ┊ ┊ xcp-volume-a02d160d-34fc-4fd6-957d-c7f3f9206ae2 ┊ 7008 ┊ xcp-sr-linstor_group_thin_device ┊ DELETING ┊ ┊ xcp-volume-ed04ffda-b379-4be7-8935-4f534f969a3f ┊ 7003 ┊ xcp-sr-linstor_group_thin_device ┊ DELETING ┊ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
Executing resource-definition delete has no impact on them. I just get the following output:
[22:24 xcp-ng-node-1 ~]# linstor resource-definition delete xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e SUCCESS: Description: Resource definition 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' marked for deletion. Details: Resource definition 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' UUID is: 52aceda9-b19b-461a-a119-f62931ba1af9 WARNING: Description: No active connection to satellite 'xcp-ng-node-3' Details: The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update. SUCCESS: Resource 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' on 'xcp-ng-node-1' deleted SUCCESS: Resource 'xcp-volume-13a94a7a-d433-4426-8232-812e3c6dc52e' on 'xcp-ng-node-2' deleted
I can confirm that node-1 can reach node-3 which it is complaining about for some reason. And I can see node-3 in XO as well and can run VMs on them.
-
@TheiLLeniumStudios In this case, if DRBD is completely stuck, you can reboot your hosts. There is probably a lock or processes that have a lock on them.
-
@AudleyElwine Thank you for your feedbacks, I will update the script to handle the journal cases.
-
@ronan-a 2 of the nodes broke after restarting. I just kept getting the blinking cursor at the top left of the screen for hours. I'm going to have to reprovision all the nodes again sadly
-
Hey @ronan-a ,
What should I do to lower the chance of something in the past installation of xostor to affect my new installation?
lsblk
is still showing the linstor volumes,vgs
is also showinglinstor_group
.
Will awipefs -af
be enough? Or is the "Destroying SR" button in XOA is enough? -
@AudleyElwine The PVs/VGs are kept after a
SR.destroy
call but it's totally safe to reuse them for a new installation. The content of/var/lib/linstor
is not removed after a destroy call, but it's normally not used because the linstor database is shared between hosts using a DRBD volume and mounted in this directory by the running controller. So you don't have manual steps to execute here.Of course if you want to reuse your disks for another thing,
wipefs
is nice for that. -
@TheiLLeniumStudios There are always a solution to repair nodes, what's the output of
linstor --controllers=<HOST_IPS> node list
?(Use a comma separated values for HOST_IPS.)
-
We just hit a weird issue that we managed to fix, but wasn't really clear at first what was wrong, and might be a good idea for you guys to add some type of healthcheck / error handling to catch this and fix it.
What happened was that for some unknown reason, our
/var/lib/linstor
mount (xcp-persistent-database) became read only, but everything kinda kept working-ish, but some stuff would randomly fail, like attempting to delete a ressource. Upon looking at the logs, saw this :Error message: The database is read only; SQL statement: UPDATE SEC_ACL_MAP SET ACCESS_TYPE = ? WHERE OBJECT_PATH = ? AND ROLE_NAME = ? [90097-197]
We did a quick write test on the mount
/var/lib/linstor
and saw that it was indeed in RO mode. We also noticed that the last update time on the db file was 2 days ago.Unmounting the mount and remounting it had the controller start back again, but the first time, some nodes were missing from the node list, so we restarted the linstor-controller service again and everything is now up and healthy.
-
I think we have a relatively important number of updates coming, fixing various potential bugs Stay tuned!
-
@olivierlambert any eta for the next beta release? Looking forward to test it.
-
Very soon
-
@Maelstrom96 said in XOSTOR hyperconvergence preview:
What happened was that for some unknown reason, our /var/lib/linstor mount (xcp-persistent-database) became read only
There is already a protection against that: https://github.com/xcp-ng/sm/commit/55779a64593df9407f861c3132ab85863b4f7e46 (2021-10-21)
So I don't understand how it's possible to have a new time this issue. Without the log files I can't say what's the source of this issue, can you share them?
Did you launch a controller manually before having this problem or not? There is a daemon to automatically mount and start a controller:
minidrbdcluster
. All actions related to the controllers must be executed by this program.Another idea, the problem can be related to: https://github.com/xcp-ng/sm/commit/a6385091370c6b358c7466944cc9b63f8c337c0d
But this commit should be present in the last release. -
@ronan-a Is there a way to easily check if the process is managed by the daemon and not a manual start? We might have some point restarted the controller manually.
Edit :
● minidrbdcluster.service - Minimalistic high-availability cluster resource manager Loaded: loaded (/usr/lib/systemd/system/minidrbdcluster.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2023-01-25 15:58:01 EST; 1 weeks 0 days ago Main PID: 2738 (python2) CGroup: /system.slice/minidrbdcluster.service ├─2738 python2 /opt/xensource/libexec/minidrbdcluster ├─2902 /usr/sbin/dmeventd └─2939 drbdsetup events2
[11:58 ovbh-pprod-xen10 system]# systemctl status var-lib-linstor.service ● var-lib-linstor.service - Mount filesystem for the LINSTOR controller Loaded: loaded (/etc/systemd/system/var-lib-linstor.service; static; vendor p reset: disabled) Active: active (exited) since Wed 2023-01-25 15:58:03 EST; 1 weeks 0 days ago Process: 2947 ExecStart=/bin/mount -w /dev/drbd/by-res/xcp-persistent-database /0 /var/lib/linstor (code=exited, status=0/SUCCESS) Main PID: 2947 (code=exited, status=0/SUCCESS) CGroup: /system.slice/var-lib-linstor.service
Also, what logs would you like to have?
Edit2 : Also, I don't believe that service would've actually caught what happened, since it was mounted first using RW, but seems like DRBD had an issue while the mount was active and changed it to RO. The controller service was still healthy and active, just impacted on DB writes.
-
@Maelstrom96 It's fine to restart the controller on the same host where it was running. But if you want to to move the controller on another host, just temporarily stop
minidrbdcluster
on the host where the controller is running. Then you can restart it.The danger is to start a controller on a host where the shared database is not mounted in
/var/lib/linstor
.To resume, if the database is mounted (check using
mountpoint /var/lib/linstor
) and if there is a running controller: no issue.Edit2 : Also, I don't believe that service would've actually caught what happened, since it was mounted first using RW, but seems like DRBD had an issue while the mount was active and changed it to RO. The controller service was still healthy and active, just impacted on DB writes.
So if it's not related to a database mount, the system may have changed the mount point to read only for some reason yes, it's clearly not impossible.
Also, what logs would you like to have?
daemon.log, SMlog, kern.log, (and also drbd-kern.log if present)
-
@ronan-a I will copy those logs soon - Do you have a way I can provide you the logs off forum since it's a production systems?
-
Not sure what we're doing wrong - Attempted to add a new host to the linstor SR and it's failing. I've run the install command with the disks we want on the host, but when running the "addHost" function, it fails.
[13:25 ovbh-pprod-xen13 ~]# xe host-call-plugin host-uuid=6e845981-1c12-4e70-b0f7-54431959d630 plugin=linstor-manager fn=addHost args:groupName=linstor_group/thin_device There was a failure communicating with the plug-in. status: addHost stdout: Failure stderr: ['VDI_IN_USE', 'OpaqueRef:f25cd94b-c948-4c3a-a410-aa29a3749943']
Edit : So it's not documented, but it looks like it's failing because the SR is in use? Does that mean that we can't add or remove hosts from linstor without unmounting all VDIs?
-
@Maelstrom96 No you can add a host with running VMs.
I suppose there is a small issue here... Please send me a new time your logs (SMlog of each host). -
We were able to finally add our new #4 host to the linstor SR after killing all VMs with attached VDIs. However, we've hit a new bug that we're not sure how to fix.
Once we added the new host, we were curious to see if a live migration to it would work - It did not. It actually just resulted in the VM being in a zombie state and we had to manually destroy the domains on both the source and destination servers, and reset the power state of the VM.
That first bug most likely was caused by our custom linstor configuration that we use where we have setup another linstor node interface on each nodes, and changed their PrefNics. It wasn't applied to the new host so the drbd connection wouldn't have worked.
[16:51 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 node interface list ovbh-pprod-xen12 ╭─────────────────────────────────────────────────────────────────────╮ ┊ ovbh-pprod-xen12 ┊ NetInterface ┊ IP ┊ Port ┊ EncryptionType ┊ ╞═════════════════════════════════════════════════════════════════════╡ ┊ + StltCon ┊ default ┊ 10.2.0.21 ┊ 3366 ┊ PLAIN ┊ ┊ + ┊ stornet ┊ 10.2.4.12 ┊ ┊ ┊ ╰─────────────────────────────────────────────────────────────────────╯ [16:41 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 node list-properties ovbh-pprod-xen12 ╭────────────────────────────────────╮ ┊ Key ┊ Value ┊ ╞════════════════════════════════════╡ ┊ Aux/xcp-ng.node ┊ ovbh-pprod-xen12 ┊ ┊ Aux/xcp-ng/node ┊ ovbh-pprod-xen12 ┊ ┊ CurStltConnName ┊ default ┊ ┊ NodeUname ┊ ovbh-pprod-xen12 ┊ ┊ PrefNic ┊ stornet ┊ ╰────────────────────────────────────╯
However, once the VM was down and all the linstor configuration was updated to match the rest of the cluster, I've tried to manually start that VM on the new host but it's not working. It seems like if linstor is not called to add the volume to the host as a diskless volume, since it's not on that host.
SMLog:
Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] lock: opening lock file /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] lock: acquired /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'True'}) returned: True Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] ', stderr: '' Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 0 Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] ', stderr: '' Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 1 Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] ', stderr: '' Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 2 Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] ', stderr: '' Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 3 Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ', stderr: '' Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 4 Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ', stderr: '' Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] failed to execute locally vhd-util (sys 2) Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] call-plugin (getVHDInfo with {'devicePath': '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0', 'groupName': 'linstor_group/thin_device', 'includeParent': 'True'}) returned: {"uuid": "02ca1b5b-fef4-47d4-8736-40908385739c", "parentUuid": "1ad76dd3-14af-4636-bf5d-6822b81bfd0c", "sizeVirt": 53687091200, "sizePhys": 1700033024, "parentPath": "/dev/drbd/by-res/xcp-v$ Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] VDI 02ca1b5b-fef4-47d4-8736-40908385739c loaded! (path=/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0, hidden=0) Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] lock: released /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] vdi_epoch_begin {'sr_uuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'subtask_of': 'DummyRef:|3f01e26c-0225-40e1-9683-bffe5bb69490|VDI.epoch_begin', 'vdi_ref': 'OpaqueRef:f25cd94b-c948-4c3a-a410-aa29a3749943', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': '02ca1b5b-fef4-47d4-8736-40908385739c', 'host_ref': 'OpaqueRef:3cd7e97c-4b79-473e-b925-c25f8cb393d8', 'session_ref': '$ Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'False'}) returned: True Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25278] lock: opening lock file /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25278] lock: acquired /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr Feb 28 17:01:43 ovbh-pprod-xen13 SM: [25278] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'True'}) returned: True Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] ', stderr: '' Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] Got exception: No such file or directory. Retry number: 0 Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0'] Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2 Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] ', stderr: '' [...]
The folder
/dev/drbd/by-res/
doesn't exist currently.Also, not sure why, but it seems like when adding the new host, a new storage pool
linstor_group_thin_device
for it's local storage wasn't provisioned automatically, but we can see that there is a diskless storage pool that was provisionned.[17:26 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 storage-pool list ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊ ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ DfltDisklessStorPool ┊ ovbh-pprod-xen10 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-pprod-xen11 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-pprod-xen12 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-pprod-xen13 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-vprod-k8s04-worker01.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-vprod-k8s04-worker02.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-vprod-k8s04-worker03.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-vtest-k8s02-worker01.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-vtest-k8s02-worker02.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ DfltDisklessStorPool ┊ ovbh-vtest-k8s02-worker03.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊ ┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen10 ┊ LVM_THIN ┊ linstor_group/thin_device ┊ 3.00 TiB ┊ 3.49 TiB ┊ True ┊ Ok ┊ ┊ ┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen11 ┊ LVM_THIN ┊ linstor_group/thin_device ┊ 3.03 TiB ┊ 3.49 TiB ┊ True ┊ Ok ┊ ┊ ┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen12 ┊ LVM_THIN ┊ linstor_group/thin_device ┊ 3.06 TiB ┊ 3.49 TiB ┊ True ┊ Ok ┊ ┊ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[17:32 ovbh-pprod-xen13 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 3.5T 0 disk ├─nvme0n1p1 259:1 0 1T 0 part │ └─md128 9:128 0 1023.9G 0 raid1 └─nvme0n1p2 259:2 0 2.5T 0 part ├─linstor_group-thin_device_tdata 252:1 0 5T 0 lvm │ └─linstor_group-thin_device 252:2 0 5T 0 lvm └─linstor_group-thin_device_tmeta 252:0 0 80M 0 lvm └─linstor_group-thin_device 252:2 0 5T 0 lvm sdb 8:16 1 447.1G 0 disk └─md127 9:127 0 447.1G 0 raid1 ├─md127p5 259:10 0 4G 0 md /var/log ├─md127p3 259:8 0 405.6G 0 md │ └─XSLocalEXT--ea64a6f6--9ef2--408a--039f--33b119fbd7e8-ea64a6f6--9ef2--408a--039f--33b119fbd7e8 252:3 0 405.6G 0 lvm /run/sr-mount/ea64a6f6-9ef2-408a-039f-33b119fbd7e8 ├─md127p1 259:6 0 18G 0 md / ├─md127p6 259:11 0 1G 0 md [SWAP] ├─md127p4 259:9 0 512M 0 md /boot/efi └─md127p2 259:7 0 18G 0 md nvme1n1 259:3 0 3.5T 0 disk ├─nvme1n1p2 259:5 0 2.5T 0 part │ └─linstor_group-thin_device_tdata 252:1 0 5T 0 lvm │ └─linstor_group-thin_device 252:2 0 5T 0 lvm └─nvme1n1p1 259:4 0 1T 0 part └─md128 9:128 0 1023.9G 0 raid1 sda 8:0 1 447.1G 0 disk └─md127 9:127 0 447.1G 0 raid1 ├─md127p5 259:10 0 4G 0 md /var/log ├─md127p3 259:8 0 405.6G 0 md │ └─XSLocalEXT--ea64a6f6--9ef2--408a--039f--33b119fbd7e8-ea64a6f6--9ef2--408a--039f--33b119fbd7e8 252:3 0 405.6G 0 lvm /run/sr-mount/ea64a6f6-9ef2-408a-039f-33b119fbd7e8 ├─md127p1 259:6 0 18G 0 md / ├─md127p6 259:11 0 1G 0 md [SWAP] ├─md127p4 259:9 0 512M 0 md /boot/efi └─md127p2 259:7 0 18G 0 md