XOSTOR hyperconvergence preview
For now it's within this thread Feel free to tell us what's missing in the first post!
@BHellman The first post has a FAQ that I update each time I meet users with a common/recurring problem.
Thanks for the replies. My issues are currently with the GUI so I don't know if that applies here. This is all from the GUI, so please let me know if that's outside the scope of this post and I can post elsewhere.
One issue is upon creating a new XOSTOR SR, the packages are installed, however the SR creation fails due to one of the package, sm-rawhba, that needs updating. You have to apply patched through the GUI then reboot the node, or execute "xe-restart-toolstack" on each node. You can then go back and create a new SR, but only after wiping the disks that you originally tried to create the SR on; vgremove and pvremove.
I'm planning on doing some more testing, please let me know if GUI issues are appropriate to post here.
@BHellman It's fine to post simple issues in this thread. For complex problems a ticket is probably better.
One issue is upon creating a new XOSTOR SR, the packages are installed, however the SR creation fails due to one of the package, sm-rawhba, that needs updating.
Not totally that, sm-rawhba is added to the list because the UI installs a modified version of sm with LINSTOR support.
The real issue is that xe-toolstack-restart is not called during the initial setup, a method is missing in our updater plugin to check if a package is present or not, I will add this method for the XOA team.
I'm not sure what the expected behavior is but....
I have xcp1, xcp2, xcp3 as hosts in my XOSTOR pool, using an XOSTOR repository. I had a VM running on xcp2, unplugged the power from it and left it uplugged for about 5 minutes. The VM remained "running" according to XOA, however it wasn't.
What is the expected behavior when this happens and how do you go about recovering from a temporarily failed/powered off node?
My expectation was that my vm would move to xcp1 (where there is a replica) and start, then outdate xcp2. I have "auto start" enabled under advanced on the VM.
"auto start" means that when you power up the cluster or host node that VM will be automatically started.
I think you're describing high availability, which needs to be enabled at the cluster level. Then you need to define a HA policy for the vm
I did those commands on xcp1 (pool master) and on the SR that was XOSTOR (linstor) and powered off xcp2. At that point the pool disappeared.
Now I'm getting the following on the xcp servers console:
Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST): xapi-nbd: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST): xapi-nbd: main: Caught unexpected exception: (Failure Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST): xapi-nbd: main: "Failed to log in via xapi's Unix domain socket in 300.000000 seconds")
After powering up xcp2 the pool never comes back in the XOA interface.
I'm seeing this on
[14:04 xcp1 ~]# drbdadm status xcp-persistent-database role:Secondary disk:Diskless quorum:no xcp2 connection:Connecting xcp3 connection:Connecting
xcp2 and 3
[14:10 xcp2 ~]# drbdadm status # No currently configured DRBD found.
Seems like I hosed this thing up really good. I assume this broke because XOSTOR isn't a shared disk technically.
[14:15 xcp1 /]# xe sr-list The server could not join the liveset because the HA daemon could not access the heartbeat disk.
Is HA + XOSTOR something that should work?
I am attempting to update our hosts, starting with the pool controller. But I am getting a message that I wanted to ask about.
The following happens when I attempt a
--> Processing Dependency: sm-linstor for package: xcp-ng-linstor-1.1-3.xcpng8.2.noarch --> Finished Dependency Resolution Error: Package: xcp-ng-linstor-1.1-3.xcpng8.2.noarch (xcp-ng-updates) Requires: sm-linstor You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest
Only reference I am finding is here: https://koji.xcp-ng.org/buildinfo?buildID=3044
My best guess is I need to do two updates, the first one skip broken. But wanted to ask to be sure as to not put things in a weird state.
Thanks in advance!
@Jonathon Never use
@Jonathon What's the output of
lol glad I checked then
# yum repolist Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Excluding mirror: updates.xcp-ng.org * xcp-ng-base: mirrors.xcp-ng.org Excluding mirror: updates.xcp-ng.org * xcp-ng-linstor: mirrors.xcp-ng.org Excluding mirror: updates.xcp-ng.org * xcp-ng-updates: mirrors.xcp-ng.org repo id repo name status !xcp-ng-base XCP-ng Base Repository 2,161 !xcp-ng-linstor XCP-ng LINSTOR Repository 142 !xcp-ng-updates XCP-ng Updates Repository 1,408 !zabbix/x86_64 Zabbix Official Repository - x86_64 79 !zabbix-non-supported/x86_64 Zabbix Official Repository non-supported - x86_64 6 repolist: 3,796
Are there any rough estimates for timeline on paid support being available? Looking at ditching vmware and my company requires professional support availability. Virtualization I see the availability but I need storage as well that is at least mostly in parity with the vsan I have. Thanks to you all! Love these projects!
We are working at full speed to get it available ASAP. There's still some bugs to fix and LINBIT is working on it.
With the integration you are doing is there provision to designate racks/sites/datacenters/etc so at some level replications can be kept off hosts in the same physical risk space(s)?
XOSTOR works at the pool level. You can have all your hosts in the pool, or only some of them participating to the HCI (eg 4 hosts with disks used for HCI and others just consuming it). Obviously, it means some hosts without the disks will have to read and write "remotely" on the hosts with the disks. But it might be perfectly acceptable
@olivierlambert I've understood that part... what I am wondering is if I have 3 hosts in one data center and 3 hosts in another, and I have asked for redundancy of 3 copies, is there a way to ensure all three copies are never in the same data center all at the same time.