Shared Storage Redundancy Testing

mauzilla

@Forza I assume we should recreate the PBD or is there another way to achieve the above? It seems like a realworld issue someone may be faced with if you replicate your NAS to a secondary as a failover?

olivierlambert

PBD remove and recreate will do the trick, no need to remove the SR.

mauzilla

@olivierlambert stupid question but would that then be "busy as usual" (AKA if the storage has the replicated data on it (or some version / snapshot) in theory after the PBD recreation the VM's will automatically pick up their individual vhds?

olivierlambert

You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

Forza

@olivierlambert said in Shared Storage Redundancy Testing:

You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

Does this mean the mapping between the disk, snapshots and VM is preserved?

It would be great if this procedure was implemented as an easy-to-use tool in XO/XOA.

mauzilla

We will test this today and let you know. Ultimately the use case here is to be able to make use of a failover NAS (which is replicated at NAS level) so that it's a simpler process to switch to a failover in the event of failure (else there is no practical point to replicate the VHD's between external storage servers if we cannot "switch" to another NAS.

I will let you know the outcome, but agree with @Forza that if this does work it would be a great addition to the GUI to allow for a "switch to failover" scenario

nikade

@Forza said in Shared Storage Redundancy Testing:

@olivierlambert said in Shared Storage Redundancy Testing:

You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

Does this mean the mapping between the disk, snapshots and VM is preserved?

It would be great if this procedure was implemented as an easy-to-use tool in XO/XOA.

If the storage is 100% replicated and looks the same, the snapshots should map to the VM's correctly since the paths would be identical.

nikade

@mauzilla said in Shared Storage Redundancy Testing:

We will test this today and let you know. Ultimately the use case here is to be able to make use of a failover NAS (which is replicated at NAS level) so that it's a simpler process to switch to a failover in the event of failure (else there is no practical point to replicate the VHD's between external storage servers if we cannot "switch" to another NAS.

I will let you know the outcome, but agree with @Forza that if this does work it would be a great addition to the GUI to allow for a "switch to failover" scenario

We're using NFS and failover on a Dell Powerstore 1000T, works pretty good and the NAS presents just 1 IP so we dont have to reconfigure or unplug/plug any VBD's.
When a node is failed the VM's just continue running and the secondary node takes over within seconds so there is really nothing happening except a really short hickup.

Forza

@nikade said in Shared Storage Redundancy Testing:

@Forza said in Shared Storage Redundancy Testing:

@olivierlambert said in Shared Storage Redundancy Testing:

You can't remove a PBD as long as you have one running disk on it. So you'll need to migrate or shutdown any VM using a disk on it, then delete the PBD and recreate it. You can use the SR maintenance mode button in XO to make it easier.

Does this mean the mapping between the disk, snapshots and VM is preserved?

It would be great if this procedure was implemented as an easy-to-use tool in XO/XOA.

If the storage is 100% replicated and looks the same, the snapshots should map to the VM's correctly since the paths would be identical.

I mean't with the procedure to remove and recreate the PDB as @olivierlambert mentioned.

mauzilla

@olivierlambert, we're simulating the pbd disconnect to see what would happen in production. The NAS was shutdown (albeit the VM's were still running), we then force shutdown the VM's.

Running xe pbd-unplug is stuck (and I assume this is likely due to the Dom being unable to umount the now stale NFS mount point). This could normally be resolved (if one has access to the dom0 with a lazy unmount) but obviously we only interact with through XAPI so not sure if there is an option to achieve this?

What we're trying to do is to avoid a reboot if a NAS fails (as it may be for the entire pool and not just for 1 host). Any ideas?

olivierlambert

You can lazy umount a failed network share, then PBD unplug will work.