Best strategy for Continuous Replication
-
I had a server dedicated to CR that was part of my pool.
i recently lost the pool master and in turn lost access to the CR host too.
The official docs state that the CR can be used if the main pools fails which indicates having the CR host as part of the pool is not a good idea.
Is it best practice to not have the CR host as part of the main pool?
Alternatively, would a better setup not being having multiple xcp-ng hosts with central shared storage for both production VMs and CR VMs. This way if a single xcp-ng host fails the CR VMs can be easily started on the other host? A variation of this would be to have two shared storage repos, one for production VMs and one for CR VMs.
I am keen to hear other's thought on this.
-
Yes, it's better to have CR to another pool (even if it's one host), this means whatever happens with the original pool (even if you lose all hosts there) you can quickly start your replicated VMs. By doing so, you remove any common things between prod and CR, so whatever happens to your prod pool won't affect the CR one. Note it could even be different hardware or CPU brand etc.
-
Makes perfect sense.
I expect having separate storage for the production VMs and CR VMs makes sense too.
I am now thinking a good robust model would be:
- One or more production hosts in a single pool (allows host migration for updates)
- One TrueNAS Scale for production shared storage
- One CR host with local storage
-
That's what we do in our prod
3x host + TrueNAS in NFS for prod, CR every night to a single host with local SR on another site.
-
@olivierlambert @McHenry I have a similar setup... 5 host main pool (N+2) with TrueNAS NFS and a single on-site CR host (stand alone with local RAID storage) that is updated hourly from the main pool. There is also an on-site backup (for quick restore) and an off-site backup to wasabi S3 (nightly).
While CR is not a traditional backup it does allow for a quick restart of a damaged or lost VM. As my CR host is on-site it can share the same network and I can just start a VM immediately and then copy it back to the main pool. The CR also acts a secondary pool where some import redundant VMs run (ie DNS, another XO, etc). These secondary VMs are CR backed up to the main pool. If the main pool were to totally fail then VMs could be started on the CR host (with restricted resources).
S3 backups allow for a long term incremental historical storage but it takes longer to restore a large VM (but you can pick a point in time to restore from).
CR off-site is great DR option, but remember CR is not a true backup, it's just a recent copy...