VM replication in XCP-ng

You can replicate a VM to any other storage, to different storage in the same pool or even to a remote pool. We'll see how it works, how to operate it and even some "non standard" use cases where VM replication can help you to achieve a complicated migration.

Context

This feature is very convenient to reduce a lot RTO (Recovery Time Objective): instead of performing then restoring a backup, you'll directly boot your replicated VM. All of this is using the delta capabilities, meaning after the initial sync, we can only send the differential.

Some backup developers are calling it "instant restore" (since it's ready to boot without restore time), but functionally, the end result is exactly the same.

💡
In some cases, you might have very low bandwidth between the 2 sites you want to replicate. In short, not enough time to do the initial sync due to the data size and the bandwidth available. That's why you can also make a first synchronization by actually moving physically the data (eg a hard drive) to your destination. See this specific section of the documentation on how-to achieve it.

Setup your replication job

It's very easy. In your Xen Orchestra interface, you just have to create a new backup job: select "Continuous Replication" and then choose the destination storage. And yes, you can select more than one if you want to replicate to multiple destinations! (but the speed will be adjusted to the slowest one).

You can also use smart mode to automatically replicate all VMs with a specific tag or power state:

Configure your schedule (every night at 2AM for example):

And that's it! You are ready to make your first replication to your distant pool/storage! After the initial "full" copy, the next run will only send what's left, explaining the visible difference in terms of data and time to backup.

Replication with memory

For those who want to also restore their replicated target without even "booting" the VM, you can replicate your VM with RAM, as if it was "hibernated". This might be useful for critical systems where you absolutely need to avoid any reboot.

Warm migration: the third way

Usually, there's 2 options to migrate a VM from a pool/host to another:

  • Live storage migration, meaning your VM is moved while it runs on the destination. It's great but suffers from some limitations: if you are writing faster than the data is migrated, it will fail. Also, live migration won't work from a very different CPU (AMD vs Intel or migrating to an older CPU with missing instruction compared to the current one).
  • Cold migration: doing the migration while the VM is off. It's "fixing" the issues of the live migration BUT your VM is off during the whole process. And if the disk is relatively big (eg hundreds + GiB), this means one word: downtime.

And this is precisely where VM replication can do the trick:

  1. Configure the replication job for your VM
  2. Start the initial replication while the VM is live. This will take a bit, but no downtime during the initial sync.
  3. When the first sync is done, you can then shutdown the "source" VM and re-trigger the replication job. This time, it will only send the delta between now and the previous backup. Downtime will be minimal.
  4. Boot the replicated VM!

This is what we call warm migration. And this simple process is really handy in various cases: we used this internally when we switched our entire new production from a hosting company with Intel CPUs to our own rack running on EPYC CPUs. It helped to only have few minutes of downtime despite massively changing the infrastructure under the VMs!