XOA/XO from Sources S3 backup feature usage/status
-
@JamesG I have been running S3 delta backups to Wasabi for about 18 months as an off-site backup. I keep about 2 weeks of backups and only do a true full backup once a quarter. I also keep an hourly CR and have other normal inband OS backups too. This mostly meets the normal 3-2-1 backup standard.
The last "big" S3 backup update for XO was about 6 months ago and helped solve a lot of merge issues and orphaned backup data that was no longer needed. It added better verification and cleanup of S3 data. That update caused a lot of dead data cleanup and I had to delete some stuff manually to get it back on track. Since then S3 backups have been working consistently for me.
-
@Andrew Thanks for that added detail.
Your success to Wasabi is encouraging. Perhaps Planedrops performance issues with BackBlaze B2 is related to a specific combination of implementation of S3 between BackBlaze and XO.
Things to test:
XO to AWS
XO to Wasabi
XO to BackBlazeTheoretically, the performance should be the same to all S3 endpoints.
-
@JamesG I'll do some more testing with B2 as well to see if I can improve things at all, will try to do a better job logging information about what exactly happened and how long those things took.
I also still need to test this writeblockconcurrency setting that @florent has mentioned in the past, though I'm still having trouble finding the right config spot to put it in.
-
Indeed, as told by everyone here, you can have variable performance depending on your providers, even depending on their own internal load. Merging will mostly do files rename and deletion, so if you provider isn't fast on doing that, that might explain the poor merging speed.
-
I'll try to answer to the questions
@JamesG the 32 chars are the key , you need to keep this key, if not you won't be able to restore anything. IF the key is wrong you'll have an error message when connecting to the remote.
I am glad to see that S3 is now stable enough for you all. To improve it again, we are planing to implement retry every times it is possible. It is already implemented during the reading of the backup in NBD, please give it a try.
There is also a test branch ( https://github.com/vatesfr/xen-orchestra/pull/6840 ) that should improve concurrency handling when using NBD reading and writing to S3
For now, you should test it on a separate job with a big VMThe merge is mostly a copy and delete of all the block (1-2MB each) that are not used anymore. This step could probably by speed up further, but we prioritized reliability here, since it can break the full backup chain it something go wrong ( and thanks @andrew for your time on this feature). You can set
mergeBlockConcurrency
to a higher value ( like 8 ) in the backups section of your config fileYou can also increase writeBlockConcurrency in the backup section of the config file to speed up transfer, especially when coupled with the test branch
The easiest way to ensure your config file is not reset on each update is to create the config in ~/.config/xo-server/config.toml (use the home directory of the user running XO )
If you have a xoa, use /etc/xo-server/config.toml[backups] writeConcurrency = 32 mergeConcurrency = 8
-
@florent Did some testing with this and wanted to let you know how long it's taking, I haven't tested the test branch yet but I am using NBD.
I had a VM with a 25 gibibyte delta that needed to be merged to Backblaze B2, this is on a 200Mbps upload connection, the upload of the new snapshot only took a few seconds (it was like 100 kibibytes), but the merge of the previous 25 gibibyte one took 2.5 hours to complete, does this seem normal?
-
Just to make sure I'm understanding the backup side of XO...
Backup retention is how many backups will be kept on the remote, and any backed up data that's older than the retention number should be removed automatically by the backup process?
For example, a "full backup" schedule that runs daily with a retention of 2, should only ever have two backups on the remote?
If XO cleans up behind itself...What exactly is it keying off of to determine what "old" files to delete?
I ask because it doesn't look like XOfS is doing any house-keeping on S3 storage (specifically BackBlaze B2).
For example, I started a daily full backup schedule with a retention of 2 on 5-21-23. As of today, all backups were still in the bucket. Before the job ran, I manually removed everything from the bucket that had file dates up to 20230526*. After the job completed, I checked and I still had backups from 20230527* on to the expected 20230531* for today. I changed retention to 3 and set "delete before backup" and executed again, but I just ended up with another 20230531* backup set. I did notice that the files themselves were coded with the day of the backup, but that the actual date on the files was within the past two days...Even if the file was an older file.
Example:
20230527T040007Z.json (2) * 14.1 KB 05/29/2023 00:04
20230527T040007Z.json.checksum (hidden) 0 bytes 05/29/2023 00:04
20230527T040007Z.xva (2) * 995.5 MB 05/29/2023 00:04
20230527T040007Z.xva.checksum (2) * 36.0 bytes 05/29/2023 00:04
20230528T040008Z.json (2) * 14.1 KB 05/30/2023 00:06
20230528T040008Z.json.checksum (hidden) 0 bytes 05/30/2023 00:06
20230528T040008Z.xva (2) * 1.0 GB 05/30/2023 00:06
20230528T040008Z.xva.checksum (2) * 36.0 bytes 05/30/2023 00:06This could just be a BackBlaze specific thing that they're doing. As you can see though, the file names indicate the date/time XO created them, but the BackBlaze file (system?) date is two days later. If XO is looking at the remote filesystem date, then this makes sense why those older backups are still retained. However if XO is looking at the filenames it creates, then I would expect it to have cleared off the older backups.
This also begs a question...If the retention is set, is the retention the number of copies, or is the retention the number of scheduled cycles? If copies, then presumably manually executing a daily backup a couple of times in a row would clean up the previous two days of backups. If cycles...Then presumably a retention of "2" for daily backups would mean it would keep all backups less than two days old. If the retention is "8" for an hourly backup, then any backups older than 8 hours would be cleared off.
The cycles method based on remote file system dates makes more sense to me and is what I would suspect XO is doing. In my case with BB, it would just appear that something strange is happening on their file system that is throwing the dates off.
-
@planedrop said in XOA/XO from Sources S3 backup feature usage/status:
@florent Did some testing with this and wanted to let you know how long it's taking, I haven't tested the test branch yet but I am using NBD.
I had a VM with a 25 gibibyte delta that needed to be merged to Backblaze B2, this is on a 200Mbps upload connection, the upload of the new snapshot only took a few seconds (it was like 100 kibibytes), but the merge of the previous 25 gibibyte one took 2.5 hours to complete, does this seem normal?
the merge duration is depending on the size of the vhd being merged, so it depends on the 2 olders backups size, not the last one
Merge is quite expensive, we pay here the cost of not transferring all the data all the time, and not growing storage used infinitely -
@JamesG said in XOA/XO from Sources S3 backup feature usage/status:
Just to make sure I'm understanding the backup side of XO...
Backup retention is how many backups will be kept on the remote, and any backed up data that's older than the retention number should be removed automatically by the backup process?
For example, a "full backup" schedule that runs daily with a retention of 2, should only ever have two backups on the remote?
If XO cleans up behind itself...What exactly is it keying off of to determine what "old" files to delete?
I ask because it doesn't look like XOfS is doing any house-keeping on S3 storage (specifically BackBlaze B2).
For example, I started a daily full backup schedule with a retention of 2 on 5-21-23. As of today, all backups were still in the bucket. Before the job ran, I manually removed everything from the bucket that had file dates up to 20230526*. After the job completed, I checked and I still had backups from 20230527* on to the expected 20230531* for today. I changed retention to 3 and set "delete before backup" and executed again, but I just ended up with another 20230531* backup set. I did notice that the files themselves were coded with the day of the backup, but that the actual date on the files was within the past two days...Even if the file was an older file.
Example:
20230527T040007Z.json (2) * 14.1 KB 05/29/2023 00:04
20230527T040007Z.json.checksum (hidden) 0 bytes 05/29/2023 00:04
20230527T040007Z.xva (2) * 995.5 MB 05/29/2023 00:04
20230527T040007Z.xva.checksum (2) * 36.0 bytes 05/29/2023 00:04
20230528T040008Z.json (2) * 14.1 KB 05/30/2023 00:06
20230528T040008Z.json.checksum (hidden) 0 bytes 05/30/2023 00:06
20230528T040008Z.xva (2) * 1.0 GB 05/30/2023 00:06
20230528T040008Z.xva.checksum (2) * 36.0 bytes 05/30/2023 00:06This could just be a BackBlaze specific thing that they're doing. As you can see though, the file names indicate the date/time XO created them, but the BackBlaze file (system?) date is two days later. If XO is looking at the remote filesystem date, then this makes sense why those older backups are still retained. However if XO is looking at the filenames it creates, then I would expect it to have cleared off the older backups.
This also begs a question...If the retention is set, is the retention the number of copies, or is the retention the number of scheduled cycles? If copies, then presumably manually executing a daily backup a couple of times in a row would clean up the previous two days of backups. If cycles...Then presumably a retention of "2" for daily backups would mean it would keep all backups less than two days old. If the retention is "8" for an hourly backup, then any backups older than 8 hours would be cleared off.
The cycles method based on remote file system dates makes more sense to me and is what I would suspect XO is doing. In my case with BB, it would just appear that something strange is happening on their file system that is throwing the dates off.
To clean the backup it looks at the date only to sort which one are older. The retention is the number of backup kept ( in this case full backup) , it does not depend on their age. For example if you disable the backup and reenable it, it will ony clean the older one.
For each full backup it creates a metadata file (mainly information on the backup job for full backup), a xva file ( which contains the VM data), and a checksum to ensure the file is not corrupted -
Attempting to confirm what's expected vs what's observed....
If retention is the number of backups kept, regardless of the date, then if I had a retention of 2, and ran 5 consecutive backups, only the last two backups should remain on the remote?
-
@JamesG yes, exactly