Best posts made by Maelstrom96 | XCP-ng and XO forum

Maelstrom96

@gb-123 said in XOSTOR hyperconvergence preview:

@ronan-a

VMs would be using LUKS encryption.

So if only VDI is replicated and hypothetically, if I loose the master node or any other node actually having the VM, then I will have to create the VM again using the replicated disk? Or would it be something like DRBD where there are actually 2 VMs running in Active/Passive mode and there is an automatic switchover ? Or would it be that One VM is running and the second gets automatically started when 1st is down ?

Sorry for the noob questions. I just wanted to be sure of the implementation.

The VM metadata is at the pool level, meaning that you wouldn't have to re-create the VM if the current VM host has a failure. However, memory can't/isn't replicated in the cluster, unless you're doing a live migration which would temporarily replicate the VM memory to the new host, so it can be moved.

DRBD only replicates the VDI, or in other terms, the disk data across the active Linstor members. If the VM is stopped or is terminated because of host failure, you should be able to start it back up on another host in your pool, but by default, this will require manual intervention to start the VM, and will require you to input your encryption password since it will be a cold boot.

If you want the VM to automatically self-start in case of failure, you can use the HA feature of XCP-ng. This wouldn't solve your issue with having to input your encryption password since, like explain earlier, the memory isn't replicated, and it would cold boot from the replicated VDI. Also, keep in mind that enabling HA adds maintenance complexity and might not be worth it.

Maelstrom96

@T3CCH What you might be looking for: https://xcp-ng.org/docs/networking.html#full-mesh-network

Maelstrom96

@ronan-a said in XOSTOR hyperconvergence preview:

@Maelstrom96 We must update our documentation for that, This will probably require executing commands manually during an upgrade.

Any news on that? We're still pretty much blocked until that's figured out.

Also, any news on when it will be officially released?

Maelstrom96

@ronan-a I've checked the commit history and saw that the breaking change seems to be related to the renaming of the KV store. Also just noticed that you renamed the volume namespace. Is there any other breaking changes that would require the deletion of the SR in order to update the sm package?

I've made a python code that makes a copy of all the old KV data to the new KV name, along with renaming the key names for the volume data and was wondering if that would be sufficient.

Thanks,

Maelstrom96

Hi @ronan-a ,

So like we said at some point, we're using a K8s cluster that is connecting to the linstor directly. It's actually going surprisingly well, and we've even deployed that in production with contingency plans in case of failure, but it's been rock solid for now.

We're working on setting up Velero to automatically backup all of our K8s cluster metadata along with the PVs for easy Disaster Recovery, but we've hit a unfortunate blocker. Here is what we're getting from Velero when attempting to do the backup/snapshot:

error:
    message: 'Failed to check and update snapshot content: failed to take snapshot
      of the volume pvc-3602bca1-5b92-4fc7-96af-ce77f35e802c: "rpc error: code = Internal
      desc = failed to create snapshot: error creating S3 backup: Message: ''LVM_THIN
      based backup shipping requires at least version 2.24 for setsid from util_linux''
      next error: Message: ''LVM_THIN based backup shipping requires support for thin_send_recv''
      next error: Message: ''Backup shipping of resource ''pvc-3602bca1-5b92-4fc7-96af-ce77f35e802c''
      cannot be started since there is no node available that supports backup shipping.''"'

It looks like when using thin volumes, we can't actually run a backup. We've checked and the current version of setsid is 2.23.2 on xcp-ng :

[12:57 ovbh-pprod-xen12 ~]# setsid --v
setsid from util-linux 2.23.2

We know that updating a package directly is a pretty bad idea, so I'm wondering if you have an idea on what we could do to solve this, or if this will be updated with other XCP-ng updates?

Thanks in advance for you time!

P.S: We're working on a full post on how we went about deploying our full K8s linstor CSI setup for other people if anyone is interested in that.

Maelstrom96

@olivierlambert
I just checked the sm repository, and it looks like it wouldn't be that complicated to add a new sm-config and pass it down to the volume creation. Do you accept PR/Contributions on that repository? We're really interested in this feature and I think I can take the time to write the code to handle this.

Maelstrom96

After reading the sm LinstorSR file, I figured out the hosts names need to exactly match the hosts names in the XCP-ng pool. I thought I tried that and that it failed the same way, but after re-trying with all valid hosts, it setup the SR correctly.

Something I've also noticed in the code is that it seems like there's not a way to deploy a secondary SR connectted to the same lintstor controller that could have a different replication factor. For some VMs that have built-in software replication/HA, like DBs, it might be prefered to have replication=1 set for the VDI.

Posts