Disaster Recovery Backup - how to restore?

olivierlambert

As soon you boot the replicated VMs, the original one will be obsolete. You can live migrate the new/replicated VM to the original site.

Zevgeny

@olivierlambert Hmm, I haven't fully thought through all the implications, but doesn't this mean that the VMs we copy back to the original site are now "new" VMs, with unique UUIDs, etc? And things like log history, etc are all reset?
VM

Can we just export the VDI data from the backup and import it into the original VM? If the backup VM was run with a fast clone, will this work properly?

olivierlambert

That's correct. In Xen Orchestra, restoring a VM (regardless the backup mode) is always creating a new VM.

This helps to deal when you lose the original VM entirely. You can't apply any delta on it: it's just gone.

However, re-apply blocks will work ONLY if we still get the latest snapshot on source VM and on the destination too. But in the case, why not just revert the snapshot on the original VM?

chitvan

@olivierlambert Sorry for reigniting the old post.In above scenario imagine a situation in which:

disaster recovery site is 500 miles away and our primary site(having 100TB data) suffers power outage for next 3 days or so and
in the meantime we activate our disaster recovery site for next 3 days and it accumulates 10GB of data more
Now our primary site has 100TB data and DR site having (100TB + 10GB) data
As per my above understanding , that after 3 days when my primary site goes live and have to transfer (100TB + 10GB) data again and that too on low bandwidth.
This would then implicate that it would take several weeks to up my primary site
Please suggest a more practical way to achieve "reverse DR" else person shall stuck in DR site like forever.

olivierlambert

Well, create a new CR job on the other way, make it run while CR VMs are running. So the "new CR" will take time but without impact prod.

After the initial sync, you can shutdown current DR VMs, redo a delta sync and start on the initial site again.

TheNorthernLight

@olivierlambert How would this work for something like an MSSQL based server using 2 drives (OS on NFS) and MSSQL Data drive in iSCSI thick provisioned (for best read/write speeds for SQL). What would be the best way to recover a VM in this scenario (without losing data of course).

olivierlambert

I'm not sure to understand the question. The goal of virtual machines + backup is to completely get rid of what's under in terms of OS or storage. It will work the same way, it should be transparent.

chitvan

@olivierlambert many thanks for your reply.As I understand as per your reply and also as per your earlier replies on the post , there is complete necessity to transfer 100 tb plus 100gb back from Dr to dc site and there is currently no workaround for the same .May please confirm.

olivierlambert

The whole problem is to be sure to apply a delta that would make sense.

For example, you can't apply new blocks on something different that you had when you sent those blocks. It's not trivial to explain, sorry

In short, you should be able to revert ONLY if you could rollback the original VM to the latest sync snapshot of your CR. In that case, it's indeed possible to reverse (not automatically, just talking from a blocks perspective).

But this requires to compare (or to know) the exact right diff between the original/latest snap you took on the original site, and send ONLY the diff since that. On destination, you need then to export a delta between the content of this snap and the current situation.

However, it's possible you didn't keep the same "last" snapshot (due to retention) on the destination. So it means there's no way to send the right delta.

The other alternative would be to compute a delta between 2 completely different storage. That will be possible with SMAPIv3 I believe.

chitvan

@olivierlambert Thanks for the detailed explanation . May please find the design of possibly achieving the the 2 way disaster recovery using pre existing functionality in the system:

1)Consider cluster1 as DC and cluster2 as DR.Distance between them 500 miles.
2) When customer selects CR , give the option/Checkbox "2 WAY ENABLE" .
3) After check the the checkbox of 2 way enable , Now despite the retention choosen by the customer , one last XO backup snapshot shall always existing
4) System starts sending the Delta to DR site

Now Disaster happens(Say power failures for few day and customer chooses for VM running in DR)

Now CR breaks(existing functionality), but a new snapshot to be automatically created first( and that would be permanent unless customer wishes to delete which would break 2 way DR completely) at DR site before VM starts running
VM runs for few days customer now chooses to go back to DC.
Now Clusters(at both sites would check existing of last mandatory snapshots and time stamp as well) .System finds that snapshot tome difference as, per schedule or less (i.e 10 mins). Good to go then.
Now DC site snapshot shall be applied in DC only and post successful application shall be considered as "seed" by DR site.
On the basis of permanent initial snapshot of DR( as was taken in step 1 and calculates Delta(VM may be powered off DR site first) and send to DC site for application and DONE !!! . We are back in DC using only existing functionalists and Deltas only .Only flows need to be revised.

Also since customer has chosen for DR , he/she understands that they would loose 10 mins of Data.

olivierlambert

Yes, it means you must keep the latest viable snapshot on the DR site. This will require to be sure when we want to revert the DR, in XO, that both snapshots on source and dest are the ones expected (otherwise, this won't work and get corrupted result).

So we might need to add something in the snap to check that's the right one.

Anyway, this is not trivial. Let's see what @julien-f think when he's around. After that, we'll have to decide to put it in the backlog (which is already stuffed as a Thanksgiving turkey)

chitvan

@olivierlambert @julien-f Hope it is included in backlog.May please confirm.

olivierlambert

This need to be groomed first. But we are pretty tight this month. Ping @marcungeschikts