New and exciting backup errors
-
Running
c0d58
from source.I can't work out what suddenly happened, because I haven't changed anything that I can think of.
All of a sudden, some delta backups are failing. Seems to be to do with timestamp formats, and the VHD cleaning process, with two of my seven VMs failing to backup to the S3 remote (Backblaze). In the GUI, the errors look like:
Expected values to be strictly equal: + actual - expected + 4294960416 - 4294959235 ^
Clean VM directory, missing or broken alias target, some metadata VHDs are missing
Invalid RFC-7231 date-time value
Here's the error log if it helps: https://gist.github.com/andrewreid/4a8e7ac8da8d7f381884d4732a03d94f
Any ideas what I've done, or what might be happening?
Cheers,
Andrew
-
Hi,
The reflex should be to go on latest commit on
master
and see what's going on from there -
@olivierlambert Ta β you're right, I should have checked that before posting, but the issue persists on
667d0
. Slightly different words but the same smell:https://gist.github.com/andrewreid/cf4f7299b2ae7e52c61e31471675740f
Is this the best spot to discuss this, or is a Github issue the better forum?
-
Let's start to investigate here
Invoking @florent
-
@andrewreid hi andrew, I think it's good to discuss of it here.
We added a lot of information in the clean phase to ensure we do not miss the information causing full vhd backups when delta may have seem ok
- means a merge have been interrupted and could not be restarted safely
- an alias ( a link ) to a vhd is missing
theses errors points to a ( or some ) interrupted backup and XO detecting it and recover : removing the broken files and probably making full backup after
- is interesting since its' in the aws sdk, I will try to find where does it come from. Also you reproduced it from master. Do you use a specific configuration of your bucket ( like object locking ) ?
-
@florent Thank you for your reply!
No, bog standard configuration with no object locking changes. This bucket has been receiving backups for months without fault, and no configuration has changed.
The other remote is an NFS share and thatβs working perfectly well.
Is your hypothesis that the S3 backup has become corrupt? Thus, would the solution be to simply abandon these backups and create new ones?
β Andrew
-
@andrewreid Not that this helps with your error, but my nightly S3 delta backups to Wasabi have been working just fine. I'm using XO source and keep mostly up to date with master.
S3 and NFS are different backup formats so NFS working does not mean S3 will work.
Sometimes Wasabi causes failures but not recently and not that last more than a few VMs in one night. The next backup run (manual restart or next nightly) seems to work correctly.
I force a full backup every 3 months just to make sure I have a good checkpoint in case undetected corruption creeps into the delta data. Plus I have replication and other backups.
I did have a problem once (not exactly yours) with a VM and just deleted the S3 backup data which forced a full backup again. Sucks because you don't have the weeks of deltas anymore.
Error #3 (date-time) is strange. May be a backblaze issue.
You could create a new S3 remote to backblaze (new bucket or directory) and test one failing VM backup to the new remote and see if it works (and works again for delta). You could set the retention low to test merge too. I have a setup like this with a local MinIO server for S3 testing and local backups.