XOSTOR Creation Issues
-
I originally presented this in the Discord channel, but I think it may be better suited here as to not flood the chat.
First, I want to state that even though this is a lab, and we're not a customer yet, the support and community has been phenomenal.
We are testing the XOSTOR in our lab. I got it running, HA wouldn't enable, so I decided to burn it down and rebuild it. For whatever reason the XOSTOR will not build. But I do have a log file. And for clarity, I tried it via CLI as well. Basically got the same result. Here is the log file.
xostor.create { "description": "Test Virtual SAN Part 2", "disksByHost": { "8c2c3bff-71ab-491a-9240-66973b0f1fe0": [ "/dev/sdb", "/dev/sdc" ], "872a4289-595b-4428-aaa5-2caa1e70162a": [ "/dev/sdb", "/dev/sdc" ], "9a7c2fb9-8db8-4032-b7fb-e4c3206bfe9c": [ "/dev/sdb", "/dev/sdc" ] }, "name": "XCP Storage 2", "provisioning": "thin", "replication": 2 } { "errors": [ { "code": "LVM_ERROR(5)", "params": [ "File descriptor 3 (/var/log/lvm-plugin.log) leaked on pvcreate invocation. Parent PID 18436: python File descriptor 9 (/dev/urandom) leaked on pvcreate invocation. Parent PID 18436: python Can't initialize physical volume \"/dev/sdb\" of volume group \"linstor_group\" without -ff /dev/sdb: physical volume not initialized. Can't initialize physical volume \"/dev/sdc\" of volume group \"linstor_group\" without -ff /dev/sdc: physical volume not initialized. ", "", "", "[XO] This error can be triggered if one of the disks is a 'tapdevs' disk.", "[XO] This error can be triggered if one of the disks have children" ], "call": { "method": "host.call_plugin", "params": [ "OpaqueRef:bfc91b0f-1edb-4962-a784-29c02b603bef", "lvm.py", "create_physical_volume", { "devices": "/dev/sdb,/dev/sdc", "ignore_existing_filesystems": "false", "force": "false" } ] } }, { "code": "LVM_ERROR(5)", "params": [ "File descriptor 3 (/var/log/lvm-plugin.log) leaked on pvcreate invocation. Parent PID 6553: python File descriptor 9 (/dev/urandom) leaked on pvcreate invocation. Parent PID 6553: python Can't initialize physical volume \"/dev/sdb\" of volume group \"linstor_group\" without -ff /dev/sdb: physical volume not initialized. Can't initialize physical volume \"/dev/sdc\" of volume group \"linstor_group\" without -ff /dev/sdc: physical volume not initialized. ", "", "", "[XO] This error can be triggered if one of the disks is a 'tapdevs' disk.", "[XO] This error can be triggered if one of the disks have children" ], "call": { "method": "host.call_plugin", "params": [ "OpaqueRef:629e58ed-eafb-49af-b45f-7f6c21d1458a", "lvm.py", "create_physical_volume", { "devices": "/dev/sdb,/dev/sdc", "ignore_existing_filesystems": "false", "force": "false" } ] } }, { "code": "LVM_ERROR(5)", "params": [ "File descriptor 3 (/var/log/lvm-plugin.log) leaked on pvcreate invocation. Parent PID 8643: python File descriptor 9 (/dev/urandom) leaked on pvcreate invocation. Parent PID 8643: python Can't initialize physical volume \"/dev/sdb\" of volume group \"linstor_group\" without -ff /dev/sdb: physical volume not initialized. Can't initialize physical volume \"/dev/sdc\" of volume group \"linstor_group\" without -ff /dev/sdc: physical volume not initialized. ", "", "", "[XO] This error can be triggered if one of the disks is a 'tapdevs' disk.", "[XO] This error can be triggered if one of the disks have children" ], "call": { "method": "host.call_plugin", "params": [ "OpaqueRef:51a2ab7e-f792-4aad-a613-ddbe0a03c9f7", "lvm.py", "create_physical_volume", { "devices": "/dev/sdb,/dev/sdc", "ignore_existing_filesystems": "false", "force": "false" } ] } } ], "message": "", "name": "Error", "stack": "Error: at next (/usr/local/lib/node_modules/xo-server/node_modules/@vates/async-each/index.js:83:24) at onRejected (/usr/local/lib/node_modules/xo-server/node_modules/@vates/async-each/index.js:65:11) at onRejectedWrapper (/usr/local/lib/node_modules/xo-server/node_modules/@vates/async-each/index.js:67:41)" }
The only thing I have left to try is completely swapping out the 6TB HDDs for new ones.
Thanks in advance! -
@Midget said in XOSTOR Creation Issues:
linstor_group
It thinks that there's still a volume group with this name. You can try wiping the drive with wipefs to remove the previous partition.
Regards, Dan
-
Sounds like this, adding @ronan-a in the loop too
-
@Danp said in XOSTOR Creation Issues:
@Midget said in XOSTOR Creation Issues:
linstor_group
It thinks that there's still a volume group with this name. You can try wiping the drive with wipefs to remove the previous partition.
Regards, Dan
Thanks for the input. I ran this command across all drives...
wipefs --all --force /dev/sdb
I then rebooted each host and attempted the XOSTOR creation again. This is the log file I got...
xostor.create { "description": "Test Virtual SAN Part 2", "disksByHost": { "8c2c3bff-71ab-491a-9240-66973b0f1fe0": [ "/dev/sdb", "/dev/sdc" ], "872a4289-595b-4428-aaa5-2caa1e70162a": [ "/dev/sdb", "/dev/sdc" ], "9a7c2fb9-8db8-4032-b7fb-e4c3206bfe9c": [ "/dev/sdb", "/dev/sdc" ] }, "name": "XCP Storage 2", "provisioning": "thick", "replication": 2 } { "code": "SR_BACKEND_FAILURE_5006", "params": [ "", "LINSTOR SR creation error [opterr=Failed to remove old node `xcp-ng4`: No connection to satellite 'xcp-ng2', No connection to satellite 'XCP-ng3', No connection to satellite 'xcp-ng4', No connection to satellite 'xcp-ng2', No connection to satellite 'XCP-ng3', No connection to satellite 'xcp-ng4']", "" ], "call": { "method": "SR.create", "params": [ "8c2c3bff-71ab-491a-9240-66973b0f1fe0", { "group-name": "linstor_group/thin_device", "redundancy": "2", "provisioning": "thick" }, 0, "XCP Storage 2", "Test Virtual SAN Part 2", "linstor", "user", true, {} ] }, "message": "SR_BACKEND_FAILURE_5006(, LINSTOR SR creation error [opterr=Failed to remove old node `xcp-ng4`: No connection to satellite 'xcp-ng2', No connection to satellite 'XCP-ng3', No connection to satellite 'xcp-ng4', No connection to satellite 'xcp-ng2', No connection to satellite 'XCP-ng3', No connection to satellite 'xcp-ng4'], )", "name": "XapiError", "stack": "XapiError: SR_BACKEND_FAILURE_5006(, LINSTOR SR creation error [opterr=Failed to remove old node `xcp-ng4`: No connection to satellite 'xcp-ng2', No connection to satellite 'XCP-ng3', No connection to satellite 'xcp-ng4', No connection to satellite 'xcp-ng2', No connection to satellite 'XCP-ng3', No connection to satellite 'xcp-ng4'], ) at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12) at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:35:21 at runNextTicks (node:internal/process/task_queues:60:5) at processImmediate (node:internal/timers:447:9) at process.callbackTrampoline (node:internal/async_hooks:130:17)" }
-
A small update, not sure if it matters because the XOSTOR isn't built. But I got this while poking around...
[12:09 xcp-ng2 ~]# linstor resource list Error: Unable to connect to linstor://localhost:3370: [Errno 99] Cannot assign requested address
So something from linstor exists on hosts. Again, not sure that matters.
EDIT
I did find multiple threads of people having the same issue and the answer was to bind 127.0.0.1 and not local host. -
@ronan-a will come here as soon as he can (he's busy rebuilding a more recent version of DRBD )
-
@Midget I believe the linstor manager only runs on one XCP-ng Host at a time. So if you ssh to each of your XCP-ng hosts, and run the command:
linstor resource list
The XCP-ng Host running the linstor manager would display the expected results. The other XCP-ng Hosts will display an error similar to what you saw.
Prior to implementing your fix, did you attempt the command from each XCP-ng host and what were the results?
I'd recommend undoing the 127.0.0.1 change and attempting from each host.
-
@learningdaily said in XOSTOR Creation Issues:
@Midget I believe the linstor manager only runs on one XCP-ng Host at a time. So if you ssh to each of your XCP-ng hosts, and run the command:
linstor resource list
The XCP-ng Host running the linstor manager would display the expected results. The other XCP-ng Hosts will display an error similar to what you saw.
Prior to implementing your fix, did you attempt the command from each XCP-ng host and what were the results?
I'd recommend undoing the 127.0.0.1 change and attempting from each host.
I haven't implemented any fix. It was just something I read.
-
@Midget I misunderstood, I thought you were mentioning the linstor error and attempting to troubleshoot that. Reviewing the rest of your thread that is not the case.
You mentioned you attempted to burn down the XOSTOR and rebuild it. Here's the issue, your steps of burning down XOSTOR aren't complete.
Simplest method if you're okay with it, format all drives to no partitions, no data, on all of your XCP-ng hosts and rebuild XCP-ng Hosts from scratch from the ISO. Doing that will allow the ISO to build the storage as expected by XOSTOR.
Advanced method - If that seems too drastic a measure, you'll probably need to review the documentation within the Linstor project to find out how you fully remove the partially removed remnants. It involves deleting volume groups, logical groups, and manually removing some linstor components.
-
@learningdaily I have to believe there is a way to fix this. Maybe once I get the time I will reload everything. It won't take long I guess.
-
So I burnt it all down to ashes. Completely redid the storage. Reinstalled XCP-ng. Let's see what happens...
-
So I burnt it all down. I thought it was going to go through. But it didn't create the XOSTOR, but I have this log...
xostor.create { "description": "Test Virtual SAN Part 2", "disksByHost": { "e9b5aa92-660c-4dad-98c7-97de52556f22": [ "/dev/sdb", "/dev/sdc" ], "eb4cab8c-2234-4c7f-af84-d1b1494da60e": [ "/dev/sdb", "/dev/sdc" ], "68b9dc54-0bf3-4dc0-854f-d4cdabb47c23": [ "/dev/sdb", "/dev/sdc" ] }, "name": "XCP Storage 2", "provisioning": "thick", "replication": 2 } { "code": "SR_UNKNOWN_DRIVER", "params": [ "linstor" ], "call": { "method": "SR.create", "params": [ "e9b5aa92-660c-4dad-98c7-97de52556f22", { "group-name": "linstor_group/thin_device", "redundancy": "2", "provisioning": "thick" }, 0, "XCP Storage 2", "Test Virtual SAN Part 2", "linstor", "user", true, {} ] }, "message": "SR_UNKNOWN_DRIVER(linstor)", "name": "XapiError", "stack": "XapiError: SR_UNKNOWN_DRIVER(linstor) at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12) at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:35:21 at runNextTicks (node:internal/process/task_queues:60:5) at processImmediate (node:internal/timers:447:9) at process.callbackTrampoline (node:internal/async_hooks:130:17)" }
And after this I got an alert the pool needed to be updated again. So I did the updates, rebooted the hosts, and tried to make the XOSTOR again. This time I got this...
xostor.create { "description": "Test Virtual SAN Part 2", "disksByHost": { "e9b5aa92-660c-4dad-98c7-97de52556f22": [ "/dev/sdb", "/dev/sdc" ], "eb4cab8c-2234-4c7f-af84-d1b1494da60e": [ "/dev/sdb", "/dev/sdc" ], "68b9dc54-0bf3-4dc0-854f-d4cdabb47c23": [ "/dev/sdb", "/dev/sdc" ] }, "name": "XCP Storage 2", "provisioning": "thick", "replication": 2 } { "errors": [ { "code": "LVM_ERROR(5)", "params": [ "File descriptor 3 (/var/log/lvm-plugin.log) leaked on pvcreate invocation. Parent PID 5262: python File descriptor 9 (/dev/urandom) leaked on pvcreate invocation. Parent PID 5262: python Can't initialize physical volume \"/dev/sdb\" of volume group \"linstor_group\" without -ff /dev/sdb: physical volume not initialized. Can't initialize physical volume \"/dev/sdc\" of volume group \"linstor_group\" without -ff /dev/sdc: physical volume not initialized. ", "", "", "[XO] This error can be triggered if one of the disks is a 'tapdevs' disk.", "[XO] This error can be triggered if one of the disks have children" ], "call": { "method": "host.call_plugin", "params": [ "OpaqueRef:fd2fcfdf-576b-4ea9-b4ac-20e91e1b4bbd", "lvm.py", "create_physical_volume", { "devices": "/dev/sdb,/dev/sdc", "ignore_existing_filesystems": "false", "force": "false" } ] } }, { "code": "LVM_ERROR(5)", "params": [ "File descriptor 3 (/var/log/lvm-plugin.log) leaked on pvcreate invocation. Parent PID 4884: python File descriptor 9 (/dev/urandom) leaked on pvcreate invocation. Parent PID 4884: python Can't initialize physical volume \"/dev/sdb\" of volume group \"linstor_group\" without -ff /dev/sdb: physical volume not initialized. Can't initialize physical volume \"/dev/sdc\" of volume group \"linstor_group\" without -ff /dev/sdc: physical volume not initialized. ", "", "", "[XO] This error can be triggered if one of the disks is a 'tapdevs' disk.", "[XO] This error can be triggered if one of the disks have children" ], "call": { "method": "host.call_plugin", "params": [ "OpaqueRef:057c701d-7d4a-4d59-8a36-db0a0ef65960", "lvm.py", "create_physical_volume", { "devices": "/dev/sdb,/dev/sdc", "ignore_existing_filesystems": "false", "force": "false" } ] } }, { "code": "LVM_ERROR(5)", "params": [ "File descriptor 3 (/var/log/lvm-plugin.log) leaked on pvcreate invocation. Parent PID 4623: python File descriptor 9 (/dev/urandom) leaked on pvcreate invocation. Parent PID 4623: python Can't initialize physical volume \"/dev/sdb\" of volume group \"linstor_group\" without -ff /dev/sdb: physical volume not initialized. Can't initialize physical volume \"/dev/sdc\" of volume group \"linstor_group\" without -ff /dev/sdc: physical volume not initialized. ", "", "", "[XO] This error can be triggered if one of the disks is a 'tapdevs' disk.", "[XO] This error can be triggered if one of the disks have children" ], "call": { "method": "host.call_plugin", "params": [ "OpaqueRef:48af9637-fc0f-402b-94da-64eac63d31f8", "lvm.py", "create_physical_volume", { "devices": "/dev/sdb,/dev/sdc", "ignore_existing_filesystems": "false", "force": "false" } ] } } ], "message": "", "name": "Error", "stack": "Error: at next (/usr/local/lib/node_modules/xo-server/node_modules/@vates/async-each/index.js:83:24) at onRejected (/usr/local/lib/node_modules/xo-server/node_modules/@vates/async-each/index.js:65:11) at onRejectedWrapper (/usr/local/lib/node_modules/xo-server/node_modules/@vates/async-each/index.js:67:41)" }
-
Quick update. I ran this command for each drive on each host...
wipefs --all --force /dev/sdX
Then tried building the XOSTOR again. This time I got this an error in the XOSTOR page that some random UUID already had XOSTOR on it, but it built the XOSTOR? I have no idea how, or what happened, but it did.
So I have my XOSTOR back.