BHellman

BHellman

Disclaimer: I work for LINBIT, makers of DRBD and LINSTOR the tech behind XOSTOR.

We can do highly available iscsi targets, we even have guides on our website that take you step by step to do it. These would be outside of XCP-NG, but would serve the same purpose.

If there is any interest from Vates to integrate DRBD/HA into XCP-NG, we're always open to discussions.

BHellman

@ronan-a @limezest

Thank you for the replies

Sorry for all the newb questions - I'm diving into this when time permits. Appreciate the help and understanding.

BHellman

I spoke with one of our developers. He said:

"When I search for

Device read short 40960 bytes remaining

I only get results for XCP-ng/Citrix with LVM. So I think this is an issue with LVM on xcp-ng. One thing we changed between thin-send-recv 1.0.X and 1.1.X is the error handling, so now errors are properly propagated. So I would guess it was always broken but newer versions actually make the error visible."

Not sure if that helps, but if there is anything to relay I'm more than happy to pass it along.

BHellman

Disclaimer: I work for LINBIT, makers of DRBD and LINSTOR the tech behind XOSTOR.

We can do highly available iscsi targets, we even have guides on our website that take you step by step to do it. These would be outside of XCP-NG, but would serve the same purpose.

If there is any interest from Vates to integrate DRBD/HA into XCP-NG, we're always open to discussions.

BHellman

@olivierlambert said in XOSTOR hyperconvergence preview:

So I imagine a very low latency between the 2 DCs? One pool with 6 hosts total and 3 per DC right?

For now, there's no placement preference, we need to discuss with LINBIT about topology.

And if the 2x DCs are far each other, I would advice to get 2x pools and use 2x XOSTOR total

This can be done using placement policies as outlined in the LINSTOR users guide. It will probably require a bit of extra work on XO to use those properties

BHellman

I did those commands on xcp1 (pool master) and on the SR that was XOSTOR (linstor) and powered off xcp2. At that point the pool disappeared.

Now I'm getting the following on the xcp servers console:

Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST):

xapi-nbd[5580]: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds


Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST):

xapi-nbd[5580]: main: Caught unexpected exception: (Failure


Broadcast message from systemd-journald@xcp3 (Thu 2024-02-08 14:03:12 EST):

xapi-nbd[5580]: main:   "Failed to log in via xapi's Unix domain socket in 300.000000 seconds")

After powering up xcp2 the pool never comes back in the XOA interface.

I'm seeing this on
xcp1:

[14:04 xcp1 ~]# drbdadm status
xcp-persistent-database role:Secondary
  disk:Diskless quorum:no
  xcp2 connection:Connecting
  xcp3 connection:Connecting

xcp2 and 3

[14:10 xcp2 ~]# drbdadm status
# No currently configured DRBD found.

Seems like I hosed this thing up really good. I assume this broke because XOSTOR isn't a shared disk technically.

[14:15 xcp1 /]# xe sr-list
The server could not join the liveset because the HA daemon could not access the heartbeat disk.

Is HA + XOSTOR something that should work?

BHellman

@ronan-a @limezest

Thank you for the replies

Sorry for all the newb questions - I'm diving into this when time permits. Appreciate the help and understanding.

BHellman

I'm not sure what the expected behavior is but....

I have xcp1, xcp2, xcp3 as hosts in my XOSTOR pool, using an XOSTOR repository. I had a VM running on xcp2, unplugged the power from it and left it uplugged for about 5 minutes. The VM remained "running" according to XOA, however it wasn't.

What is the expected behavior when this happens and how do you go about recovering from a temporarily failed/powered off node?

My expectation was that my vm would move to xcp1 (where there is a replica) and start, then outdate xcp2. I have "auto start" enabled under advanced on the VM.

BHellman

Thanks for the replies. My issues are currently with the GUI so I don't know if that applies here. This is all from the GUI, so please let me know if that's outside the scope of this post and I can post elsewhere.

One issue is upon creating a new XOSTOR SR, the packages are installed, however the SR creation fails due to one of the package, sm-rawhba, that needs updating. You have to apply patched through the GUI then reboot the node, or execute "xe-restart-toolstack" on each node. You can then go back and create a new SR, but only after wiping the disks that you originally tried to create the SR on; vgremove and pvremove.

I'm planning on doing some more testing, please let me know if GUI issues are appropriate to post here.

BHellman

This thread has grown quite large and has a lot of information in it. Is there an official documentation chapter on XOSTOR available anywhere?

BHellman

@BHellman

Best posts made by BHellman

Latest posts made by BHellman