Backup and the replication - Functioning/Scale
-
Hello everyone,
I am trying to better understand the load/possible bottlenecks implied by the backup and the replication process.
Is there any documentation explaining in deep details how it works (the docs I could find are quite macro on this subject) ?
For example, regarding incremental backups, is the compression done by the source host, or by XOA ?
Is the algorithm zstd or brotli ?
And how do you scale up if you have more than 1000 VMs ? Do you distribute the backup jobs between multiple XOA, or upgrade XOA ressources, dom0 ressources ?
Do XO proxies have less abilities than XOA regarding backup and replication capabilities, and need to "lean" on XOA for specific functions ?I know this is a lot of questions

Keep up the great work.
Thank you from Paris -
@fcgo hello there,
from my experience : XOA proxies are just some stripped downed XOA. no limitation on backup functionnalities.
I'm ok with you, it lacks from in depth informations on how the backups are done...only place to find compression informations is in Disaster Recovey jobs.
we upgraded CPU & RAM of our XOA, but offloaded all backup tasks to proxies
if you have 1000+ VMs i think you already have at least 16Gb of RAM for dom0 (or you have 500 hosts with 2VMs each and stick with the default...)assigning jobs to proxies is kind of manual (for exemple veeam can manage a pool of proxies and take the least occupied or chosen one(s) in the pool)
if you implement proxies, you will be confronted to assign them to a job AND subsenquently assign them to remotes too ! you have to have a good planification of what you want to be done (a remote locked to a proxy is not seen by other proxies... sometimes need to create the SAME remote twice to attach it to two proxies... and be sure not to run in parallele on these two...)
planification of backups of 1000VMs is something.
-
Adding @florent and @bastien-nollet in the loop for reading those ideas on better reports and proxy automation on assignation.
-
- compression:
- on incremental : only if you use the mode "block" on the remote setting, it is done by xoa ( or proxy ) using brotli with setting BROTLI_MIN_QUALITY . The goal is to compress the empty parts of the data blocks exported as zeroes
- on full: done by the host, with gzip or zstd
- generally the limiting factor is the individual export speed of the xapi, you scale by increasing concurrency . increasing Dom0 resource is good here
- latency from host to backup runner is killing performance : proxies are great to have them as near as possible as your source hosts
- as Pilow said: proxies are dumbed down XO ( same backup code ) , but lack the scheduling and configuration part. XO is the one launching the
- Each backup job use its own process ( so cpu and memory) on XO and proxies, so it's another way to scale
The biggest backup job I saw in the wild ( during a support ) was a few hundreds VMs. Maybe there are bigger one that are working without issue.
-
-
@fcgo when adding an XOPROXY to a job, it flows from XOPROXY to the remote
I think XOA is not involved, checked the network bandwidth, xoa was sleepingXOPROXY read/writes from source remote to destination remote
-
@Pilow I was asking in the case where the sites are only allowed to communicate via HTTPS, meaning NFS remote at site 1 is accessible through proxy 1, and NFS remote at site 2 is accessible through proxy 2 (XOA being on site 1).
-
@fcgo this is where you reach a limit...
in an XOA backup copy job configuration, you select the proxy, and it drives source AND destinationyou cannot have a proxy for SOURCE and a proxy for DESTINATION
-
but each remote (SOURCE and DESTINATION) can be attached to the same proxy... so, the proxy read/writes
and you have to manage your network accordingly so that this proxy can reach each remote (here goes the static routing or vpns...)
-
@florent can you confirm for the replication job ?
Thank you