What happens when a delta backup fails?

mauzilla

During our setup we've had a couple of backups fail (some due to IO overload and some due to them being too big and got aborted, nothing XO related). Backups are delta backups with CR

For example, one of the VMs that started backing up last night never completed with a IO error. I have the option to restart it, but I noted that the snapshot that was created for the VM is still intact (understandable, the error was "fatal"). When we restart the backup, does this automatically clear any previous snapshots and start a new backup or does it attempt to continue? Obviously we dont want the previous snapshot still on the hypervisor so should we manually delete that snapshot before starting the backup again? Also, what happens when this happens to a delta backup which has been running for a while. Does the restart start a brand new backup (so delete the snapshot, do a new full initial seed again) or does it attempt to "recover"?

Lastly, I have noted that there are 2 simultanious export / backups running at a time. Is there a way I can configure XO to only backup 1 VM at a time VS 2? We're having a tough time getting the large VM's backed up and replicated within the 24 hour period so looking at limiting the exports / backups to a single VM at a time to lessen the storage IO impact and hopefully get the job done faster.

Darkbeldin

You can manage the number of simultaneous backup job by setting concurrency, default is at 2 but you can set it at 1 if needed.

For delta backup, if a job fail, the next backup will be a delta from the previous good delta. Normally unnecessary snapshots are cleaned after not being needed for backup, if they're not for some reason you will see them on your health panel. Dev team working on a deeper cleaner for remaining lost files, should be include shortly.

You should not manually remove snapshots created by the backups as i said they normally would be cleaned by the backup process.

mauzilla

@darkbeldin said in What happens when a delta backup fails?:

You can manage the number of simultaneous backup job by setting concurrency, default is at 2 but you can set it at 1 if needed.
For delta backup, if a job fail, the next backup will be a delta from the previous good delta. Normally unnecessary snapshots are cleaned after not being needed for backup, if they're not for some reason you will see them on your health panel. Dev team working on a deeper cleaner for remaining lost files, should be include shortly.
You should not manually remove snapshots created by the backups as i said they normally would be cleaned by the backup process.

Thank you @Darkbeldin - I was looking for the concurrency setting and found it now when I enable Advanced Thank you for the feedback!

Would you advise that if we see a backup did fail, that we rather opt to "restart" that backup entirely (new chain) for safety? If so, I take it we should remove the VM from the backup set (remove the tag) and then remove the snapshots from the health panel and then re-add the tag so that it starts a new backup?

Darkbeldin

@mauzilla
Having one Delta failing for some reason is not really critical as long as the chain of backup remain healty, you can just restart the failed backup to have a current delta.

What i recommend is having another full backup job on a larger timeframe and on another remote (weekly for example). So if anything happen you still don't loose everything. It all depends on your need of having a really up to date backup or if you can afford to loose some data and also the storage you have available to store all of this data.

Forza

I second that. VDI chain corruption can happen and then all backups in that chain could fail to restore.

You can use Full Backup Interval in advanced settings

You can also use a separate backup job as @mauzilla mentioned. This has the advantage of being able to use zstd compression which reduces the bandwidth and storage for full backups greatly (2-10x on my VMs)!

mauzilla

@s-pam said in What happens when a delta backup fails?:

I second that. VDI chain corruption can happen and then all backups in that chain could fail to restore.

Being new to delta backups (we've always just done a full export XVA and then manually imported them into a failover network), is there anything we need to look out for? If all backups were completed successfully, should we be worried that the delta backups may be corrupted?

For this particular backup, we backup both delta and we replicate to failover server, but I take it that if there is corruption, it will be in both the delta NFS path and the failover server designated for this. What "tests" can we do apart from restarting the chain or another full backup or starting the VM on the failover host to test the integrity of data we have?

Forza

@mauzilla said in What happens when a delta backup fails?:

@s-pam said in What happens when a delta backup fails?:

I second that. VDI chain corruption can happen and then all backups in that chain could fail to restore.

Being new to delta backups (we've always just done a full export XVA and then manually imported them into a failover network), is there anything we need to look out for? If all backups were completed successfully, should we be worried that the delta backups may be corrupted?

As long as backups are happening without errors they should be OK. I do not know what failure mode exists here. Bad blocks/corrupt data, crash or reboot ruing VDI merging, etc? Probably rather rare.
Perhaps someone here has some anecdotal evidence?

You could use several remotes in the same backup job. Then the VDI chaines on each remote would be independent.

For this particular backup, we backup both delta and we replicate to failover server, but I take it that if there is corruption, it will be in both the delta NFS path and the failover server designated for this. What "tests" can we do apart from restarting the chain or another full backup or starting the VM on the failover host to test the integrity of data we have?

I do not know of any automated tests. But restoring a VM from a backup should at least test if the VDI chain is healthy.

Perhaps this could be a new feature in XO/XOA as a compliment?

olivierlambert

IIRC, we do test the VHD structure in XO (but @julien-f can confirm this)

julien-f

@olivierlambert XO does not test the full VHD structure because that would be too long, but does check both the begining and the end of the file (and all other VHDs in the chain) before each backup to make sure the files were not truncated.

For full backup we are using a simple heuristic to attempt to validate XVA files, but it's very basic and unfortunately it does not applies to compressed files.