@ronan-a
We were able to finally add our new #4 host to the linstor SR after killing all VMs with attached VDIs. However, we've hit a new bug that we're not sure how to fix.
Once we added the new host, we were curious to see if a live migration to it would work - It did not. It actually just resulted in the VM being in a zombie state and we had to manually destroy the domains on both the source and destination servers, and reset the power state of the VM.
That first bug most likely was caused by our custom linstor configuration that we use where we have setup another linstor node interface on each nodes, and changed their PrefNics. It wasn't applied to the new host so the drbd connection wouldn't have worked.
[16:51 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 node interface list ovbh-pprod-xen12
╭─────────────────────────────────────────────────────────────────────╮
┊ ovbh-pprod-xen12 ┊ NetInterface ┊ IP ┊ Port ┊ EncryptionType ┊
╞═════════════════════════════════════════════════════════════════════╡
┊ + StltCon ┊ default ┊ 10.2.0.21 ┊ 3366 ┊ PLAIN ┊
┊ + ┊ stornet ┊ 10.2.4.12 ┊ ┊ ┊
╰─────────────────────────────────────────────────────────────────────╯
[16:41 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 node list-properties ovbh-pprod-xen12
╭────────────────────────────────────╮
┊ Key ┊ Value ┊
╞════════════════════════════════════╡
┊ Aux/xcp-ng.node ┊ ovbh-pprod-xen12 ┊
┊ Aux/xcp-ng/node ┊ ovbh-pprod-xen12 ┊
┊ CurStltConnName ┊ default ┊
┊ NodeUname ┊ ovbh-pprod-xen12 ┊
┊ PrefNic ┊ stornet ┊
╰────────────────────────────────────╯
However, once the VM was down and all the linstor configuration was updated to match the rest of the cluster, I've tried to manually start that VM on the new host but it's not working. It seems like if linstor is not called to add the volume to the host as a diskless volume, since it's not on that host.
SMLog:
Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] lock: opening lock file /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] lock: acquired /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'True'}) returned: True
Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 0
Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 1
Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 2
Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 3
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 4
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] failed to execute locally vhd-util (sys 2)
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] call-plugin (getVHDInfo with {'devicePath': '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0', 'groupName': 'linstor_group/thin_device', 'includeParent': 'True'}) returned: {"uuid": "02ca1b5b-fef4-47d4-8736-40908385739c", "parentUuid": "1ad76dd3-14af-4636-bf5d-6822b81bfd0c", "sizeVirt": 53687091200, "sizePhys": 1700033024, "parentPath": "/dev/drbd/by-res/xcp-v$
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] VDI 02ca1b5b-fef4-47d4-8736-40908385739c loaded! (path=/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0, hidden=0)
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] lock: released /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] vdi_epoch_begin {'sr_uuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'subtask_of': 'DummyRef:|3f01e26c-0225-40e1-9683-bffe5bb69490|VDI.epoch_begin', 'vdi_ref': 'OpaqueRef:f25cd94b-c948-4c3a-a410-aa29a3749943', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': '02ca1b5b-fef4-47d4-8736-40908385739c', 'host_ref': 'OpaqueRef:3cd7e97c-4b79-473e-b925-c25f8cb393d8', 'session_ref': '$
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'False'}) returned: True
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25278] lock: opening lock file /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25278] lock: acquired /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
Feb 28 17:01:43 ovbh-pprod-xen13 SM: [25278] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'True'}) returned: True
Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] ', stderr: ''
Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] Got exception: No such file or directory. Retry number: 0
Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] ', stderr: ''
[...]
The folder /dev/drbd/by-res/
doesn't exist currently.
Also, not sure why, but it seems like when adding the new host, a new storage pool linstor_group_thin_device
for it's local storage wasn't provisioned automatically, but we can see that there is a diskless storage pool that was provisionned.
[17:26 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 storage-pool list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ ovbh-pprod-xen10 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-pprod-xen11 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-pprod-xen12 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-pprod-xen13 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-vprod-k8s04-worker01.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-vprod-k8s04-worker02.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-vprod-k8s04-worker03.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-vtest-k8s02-worker01.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-vtest-k8s02-worker02.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ ovbh-vtest-k8s02-worker03.floatplane.com ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen10 ┊ LVM_THIN ┊ linstor_group/thin_device ┊ 3.00 TiB ┊ 3.49 TiB ┊ True ┊ Ok ┊ ┊
┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen11 ┊ LVM_THIN ┊ linstor_group/thin_device ┊ 3.03 TiB ┊ 3.49 TiB ┊ True ┊ Ok ┊ ┊
┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen12 ┊ LVM_THIN ┊ linstor_group/thin_device ┊ 3.06 TiB ┊ 3.49 TiB ┊ True ┊ Ok ┊ ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[17:32 ovbh-pprod-xen13 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 3.5T 0 disk
├─nvme0n1p1 259:1 0 1T 0 part
│ └─md128 9:128 0 1023.9G 0 raid1
└─nvme0n1p2 259:2 0 2.5T 0 part
├─linstor_group-thin_device_tdata 252:1 0 5T 0 lvm
│ └─linstor_group-thin_device 252:2 0 5T 0 lvm
└─linstor_group-thin_device_tmeta 252:0 0 80M 0 lvm
└─linstor_group-thin_device 252:2 0 5T 0 lvm
sdb 8:16 1 447.1G 0 disk
└─md127 9:127 0 447.1G 0 raid1
├─md127p5 259:10 0 4G 0 md /var/log
├─md127p3 259:8 0 405.6G 0 md
│ └─XSLocalEXT--ea64a6f6--9ef2--408a--039f--33b119fbd7e8-ea64a6f6--9ef2--408a--039f--33b119fbd7e8 252:3 0 405.6G 0 lvm /run/sr-mount/ea64a6f6-9ef2-408a-039f-33b119fbd7e8
├─md127p1 259:6 0 18G 0 md /
├─md127p6 259:11 0 1G 0 md [SWAP]
├─md127p4 259:9 0 512M 0 md /boot/efi
└─md127p2 259:7 0 18G 0 md
nvme1n1 259:3 0 3.5T 0 disk
├─nvme1n1p2 259:5 0 2.5T 0 part
│ └─linstor_group-thin_device_tdata 252:1 0 5T 0 lvm
│ └─linstor_group-thin_device 252:2 0 5T 0 lvm
└─nvme1n1p1 259:4 0 1T 0 part
└─md128 9:128 0 1023.9G 0 raid1
sda 8:0 1 447.1G 0 disk
└─md127 9:127 0 447.1G 0 raid1
├─md127p5 259:10 0 4G 0 md /var/log
├─md127p3 259:8 0 405.6G 0 md
│ └─XSLocalEXT--ea64a6f6--9ef2--408a--039f--33b119fbd7e8-ea64a6f6--9ef2--408a--039f--33b119fbd7e8 252:3 0 405.6G 0 lvm /run/sr-mount/ea64a6f6-9ef2-408a-039f-33b119fbd7e8
├─md127p1 259:6 0 18G 0 md /
├─md127p6 259:11 0 1G 0 md [SWAP]
├─md127p4 259:9 0 512M 0 md /boot/efi
└─md127p2 259:7 0 18G 0 md