Disaster Recovery Backup - how to restore?
-
I have 2 hosts at different sites, a primary and a backup.
I am using the Disaster Recovery Backup to mirror my VMs from the primary host to the backup.
Lets say one day I have a power outage at my primary site. So I fast clone and start up the VMs on my backup host, to continue the workload. These VMs will accumulate new data and changes etc.
Now power is restored to the primary site, and I want to capture the state of the VMs at the backup site and restore it to the primary site.
Whats the best way to do this? Can I do it without needing to destroy the VMs at the primary site? Can I copy the backups "into" the existing VMs at the primary site?
-
As soon you boot the replicated VMs, the original one will be obsolete. You can live migrate the new/replicated VM to the original site.
-
@olivierlambert Hmm, I haven't fully thought through all the implications, but doesn't this mean that the VMs we copy back to the original site are now "new" VMs, with unique UUIDs, etc? And things like log history, etc are all reset?
VMCan we just export the VDI data from the backup and import it into the original VM? If the backup VM was run with a fast clone, will this work properly?
-
That's correct. In Xen Orchestra, restoring a VM (regardless the backup mode) is always creating a new VM.
This helps to deal when you lose the original VM entirely. You can't apply any delta on it: it's just gone.
However, re-apply blocks will work ONLY if we still get the latest snapshot on source VM and on the destination too. But in the case, why not just revert the snapshot on the original VM?
-
@olivierlambert Sorry for reigniting the old post.In above scenario imagine a situation in which:
- disaster recovery site is 500 miles away and our primary site(having 100TB data) suffers power outage for next 3 days or so and
- in the meantime we activate our disaster recovery site for next 3 days and it accumulates 10GB of data more
- Now our primary site has 100TB data and DR site having (100TB + 10GB) data
- As per my above understanding , that after 3 days when my primary site goes live and have to transfer (100TB + 10GB) data again and that too on low bandwidth.
- This would then implicate that it would take several weeks to up my primary site
Please suggest a more practical way to achieve "reverse DR" else person shall stuck in DR site like forever.
-
Well, create a new CR job on the other way, make it run while CR VMs are running. So the "new CR" will take time but without impact prod.
After the initial sync, you can shutdown current DR VMs, redo a delta sync and start on the initial site again.
-
@olivierlambert How would this work for something like an MSSQL based server using 2 drives (OS on NFS) and MSSQL Data drive in iSCSI thick provisioned (for best read/write speeds for SQL). What would be the best way to recover a VM in this scenario (without losing data of course).
-
I'm not sure to understand the question. The goal of virtual machines + backup is to completely get rid of what's under in terms of OS or storage. It will work the same way, it should be transparent.
-
@olivierlambert many thanks for your reply.As I understand as per your reply and also as per your earlier replies on the post , there is complete necessity to transfer 100 tb plus 100gb back from Dr to dc site and there is currently no workaround for the same .May please confirm.
-
The whole problem is to be sure to apply a delta that would make sense.
For example, you can't apply new blocks on something different that you had when you sent those blocks. It's not trivial to explain, sorry
In short, you should be able to revert ONLY if you could rollback the original VM to the latest sync snapshot of your CR. In that case, it's indeed possible to reverse (not automatically, just talking from a blocks perspective).
But this requires to compare (or to know) the exact right diff between the original/latest snap you took on the original site, and send ONLY the diff since that. On destination, you need then to export a delta between the content of this snap and the current situation.
However, it's possible you didn't keep the same "last" snapshot (due to retention) on the destination. So it means there's no way to send the right delta.
The other alternative would be to compute a delta between 2 completely different storage. That will be possible with SMAPIv3 I believe.
-
@olivierlambert Thanks for the detailed explanation . May please find the design of possibly achieving the the 2 way disaster recovery using pre existing functionality in the system:
1)Consider cluster1 as DC and cluster2 as DR.Distance between them 500 miles.
2) When customer selects CR , give the option/Checkbox "2 WAY ENABLE" .
3) After check the the checkbox of 2 way enable , Now despite the retention choosen by the customer , one last XO backup snapshot shall always existing
4) System starts sending the Delta to DR siteNow Disaster happens(Say power failures for few day and customer chooses for VM running in DR)
- Now CR breaks(existing functionality), but a new snapshot to be automatically created first( and that would be permanent unless customer wishes to delete which would break 2 way DR completely) at DR site before VM starts running
- VM runs for few days customer now chooses to go back to DC.
- Now Clusters(at both sites would check existing of last mandatory snapshots and time stamp as well) .System finds that snapshot tome difference as, per schedule or less (i.e 10 mins). Good to go then.
- Now DC site snapshot shall be applied in DC only and post successful application shall be considered as "seed" by DR site.
- On the basis of permanent initial snapshot of DR( as was taken in step 1 and calculates Delta(VM may be powered off DR site first) and send to DC site for application and DONE !!! . We are back in DC using only existing functionalists and Deltas only .Only flows need to be revised.
Also since customer has chosen for DR , he/she understands that they would loose 10 mins of Data.
-
Yes, it means you must keep the latest viable snapshot on the DR site. This will require to be sure when we want to revert the DR, in XO, that both snapshots on source and dest are the ones expected (otherwise, this won't work and get corrupted result).
So we might need to add something in the snap to check that's the right one.
Anyway, this is not trivial. Let's see what @julien-f think when he's around. After that, we'll have to decide to put it in the backlog (which is already stuffed as a Thanksgiving turkey)
-
@olivierlambert @julien-f Hope it is included in backlog.May please confirm.
-
This need to be groomed first. But we are pretty tight this month. Ping @marcungeschikts