Delta backups - maybe not rsync friendly?
-
Hi,
We have a system where we've setup delta backups to an NFS remote that's on the local LAN.
These work really well, e.g. for the first backup we can see 100Gbyte being backed up, and subsequent backups for our VM run to about 10-15Gbyte.
We've been trying to rsync that local 'Remote' to another box over a WAN - but run into issues (that it takes for ever, a lot of data seems to have changed etc.)
Looking at the raw files on the XO "Remote" - I can see the UUID directory for the VM - and, buried away under the VDI's subdirectory etc. - I can see the VHD files.
This seems to comprise of the 100GByte 'base' VHD - and then a number of smaller differentials (as you'd expect).
But after a backup run - the remote NFS seems to show that the base .VHD has been changed (thus rsync will dutifully copy over the 100GByte file again).
Do the delta backups modify this base VHD? - Maybe to apply the oldest diff to bring it forward in time before deleting that older diff?
Just trying to get my head around how these work - and what the impact will be for rsync.
Is there any better mode we could use when the remote is to be rsync'd off site? - Unfortunately we can't write directly to the off-site NFS - we only really have rsync access.
Thanks!
-
Hi,
Yes, with the "legacy" way to store backup files, the full is modified (the oldest delta is merged in the full). So it's not really made for rsync
You can use the other mode which won't modify any file: you need to create a new remote with
Store backup as multiple data blocks instead of a whole VHD file
enabled. You need a filesystem able to deal with a lot of files!Also, FYI, backup tiering will come before the end of the year, so you won't have to deal with it yourself
-
Thanks for the details / options. The boxes we're using support rsync 'block difference' mode - so we're giving that a try (on paper it should help if the boxes have enough 'grunt' to do this).
Failing that we'll check out the 'store backup as multiple data blocks' option.
Looking forward to the backup tiering - backups has been (and continues to be) one of many things XO does really, really well.
Cheers!
-
Multiple data blocks is the future, so if you can use it, use it (FS that's not ext4 or NTFS, so XFS, ZFS or Btfs will do the trick).
It also supports compression, encryption and "soon" deduplication too.
-
Hi,
Final question on this - rsync 'block' mode failed miserably to make a difference.
When XO "merges" diffs into the base image for delta backups - does it do this by just changing updated blocks -within the existing file- or does it do something like "copy whole base image to a new file, merging the diffs as it goes - then rename at the end"?
We looked at multiple-blocks - but "not ext4" kind of delays us actually trying it for now...
Thanks!
-
Adding @florent in the convo
-
@Tackyone in block mode : merging in block mode is a multistep rename , where we try to rename as little block as possible while limiting the time with a broken vhd
we move the 2MB blocks of the newer VHD to the older one (overwriting some blocks in the process), then rename the alias ( a sort symbolic link ) to have the link that pointed to the newer VHD so that it point to the older vhd folder. Older alias and the dataless folder of the newer vhd are purgedin file mode : we modify the content of the older vhd file with 2MB parts from the newer vhd file. Then we rename. Rsync should be able to transfer only part of the file, but depending on the number of changes and the "chunk" sized use by rsync, it can transfer most of the file. We probably could implement a rsync friendly mode, where new and modified parts are always written at the end of the file, but it will lead to growing vhd and create its own set of problem.
One of our next step will be to implement a transfert/ replication inside XO, that will benefit from knowing the file structures and planification.
-
Hi - thanks for the info. That obviously covers the multi-file / block "new" system.
Do you know how the merge is handled on "single file" Delta backups?
e.g. Every day now with the existing delta backups we can see a new delta diff being created (20GByte) - and we can see the base image is "touched" (as the oldest diff gets rolled into it) - but what we can't see is if the old merge process does a "copy base + merge diffs" then a rename (which would suggest copy-on-write filesystems will see it all be touched)
Or if it merges the diff into the base "in place" (i.e. by seeking - and over-writing areas within the existing base 100G VHD).
I'm suspecting for safety it does a copy + merge to a new file, then delete old & rename or something...
-
@Tackyone after rereading your posts I added more detail on the legacy mode
is it more clear ?
-
Yes, thanks!
We have very limited control over rsync at our end (as it's running within a Synoloy NAS) - I think we probably have to just live with large transfers for now - whilst looking at spit files / VHD's as the way to go in the future.
Thanks again!