New and exciting backup errors

andrewreid

Running c0d58 from source.

I can't work out what suddenly happened, because I haven't changed anything that I can think of.

All of a sudden, some delta backups are failing. Seems to be to do with timestamp formats, and the VHD cleaning process, with two of my seven VMs failing to backup to the S3 remote (Backblaze). In the GUI, the errors look like:

Expected values to be strictly equal: + actual - expected + 4294960416 - 4294959235 ^
Clean VM directory, missing or broken alias target, some metadata VHDs are missing
Invalid RFC-7231 date-time value

Here's the error log if it helps: https://gist.github.com/andrewreid/4a8e7ac8da8d7f381884d4732a03d94f

Any ideas what I've done, or what might be happening?

Cheers,

Andrew

olivierlambert

Hi,

The reflex should be to go on latest commit on master and see what's going on from there

andrewreid

@olivierlambert Ta – you're right, I should have checked that before posting, but the issue persists on 667d0. Slightly different words but the same smell:

https://gist.github.com/andrewreid/cf4f7299b2ae7e52c61e31471675740f

Is this the best spot to discuss this, or is a Github issue the better forum?

olivierlambert

Let's start to investigate here

Invoking @florent

florent

@andrewreid hi andrew, I think it's good to discuss of it here.

We added a lot of information in the clean phase to ensure we do not miss the information causing full vhd backups when delta may have seem ok

means a merge have been interrupted and could not be restarted safely
an alias ( a link ) to a vhd is missing

theses errors points to a ( or some ) interrupted backup and XO detecting it and recover : removing the broken files and probably making full backup after

is interesting since its' in the aws sdk, I will try to find where does it come from. Also you reproduced it from master. Do you use a specific configuration of your bucket ( like object locking ) ?

andrewreid

@florent Thank you for your reply!

No, bog standard configuration with no object locking changes. This bucket has been receiving backups for months without fault, and no configuration has changed.

The other remote is an NFS share and that’s working perfectly well.

Is your hypothesis that the S3 backup has become corrupt? Thus, would the solution be to simply abandon these backups and create new ones?

— Andrew

Andrew

@andrewreid Not that this helps with your error, but my nightly S3 delta backups to Wasabi have been working just fine. I'm using XO source and keep mostly up to date with master.

S3 and NFS are different backup formats so NFS working does not mean S3 will work.

Sometimes Wasabi causes failures but not recently and not that last more than a few VMs in one night. The next backup run (manual restart or next nightly) seems to work correctly.

I force a full backup every 3 months just to make sure I have a good checkpoint in case undetected corruption creeps into the delta data. Plus I have replication and other backups.

I did have a problem once (not exactly yours) with a VM and just deleted the S3 backup data which forced a full backup again. Sucks because you don't have the weeks of deltas anymore.

Error #3 (date-time) is strange. May be a backblaze issue.

You could create a new S3 remote to backblaze (new bucket or directory) and test one failing VM backup to the new remote and see if it works (and works again for delta). You could set the retention low to test merge too. I have a setup like this with a local MinIO server for S3 testing and local backups.