Incremental Backups Periodically Results In EBUSY File Lock Error
-
Trying to figure out the root cause of this, as I think it is something I'm doing but maybe something should be in place to prevent it.
In a production environment we run nightly incremental backups for some pretty large VMs (1-2TiB), it works perfectly fine 99% of the time, but once in a while I get an error on just 1 VM (not always the same one), it will say something like EBUSY: and then something about the file being locked. With this we usually get failed merges too saying the parent VHD is missing. I will include more error details below, didn't want to fill up the top of the post with it.
Anyway, point is I started to finally notice a pattern to this, it was commonly happening on Patch Tuesday, which got me thinking that I snapshot all our VMs on patch tuesday before doing the updates in case something breaks, and I start this around the start time of our backups.
Is there any way that taking a snapshot of a VM, while it's backup is running, could result in a missing parent VHD?
The only way to fix this, so far, that I've found, is to wipe the entire backup directory for that VMs UUID and then re-start the backup (then it always works from there on out until it happens again).
For clarity, the child is in the TrueNAS directory for both of these whereas the parent is the local VM VHD.
The only other thing I can think of is related to backing up/snapshotting our TrueNAS dataset for this, which starts about an hour before our VM backups to the TrueNAS and takes a snapshot first which should avoid messing with the data being written even if those cloud backups go longer than the VM backups to the NAS.
-
Hi,
Can you switch to block backup and see if you have a similar problem?
-
@olivierlambert Apologies, do you mean NBD? Or is there another setting and I'm just forgetting where it is?
I'll also try to set this up in my lab and see if I can reproduce it, it's not a huge deal since I could easily just avoid taking snapshots at the same time a backup is running, but if that does create an issue IMO the UI should prevent users from doing it.
Thanks as always for the help and excellent work!
-
No, I mean the block mode, which stores the VHD in 2MiB block files instead of big flat VHD. It is setup on the "remote" section directly
-
@olivierlambert Ah gotcha, forgot that was an option in remotes haha! I'll give this a shot and see if things are any better.
Is it safe to change a remote that isn't setup in block mode to block mode or should I create a new remote/redo the backups?
-
Good question, let me bring you @florent here
-
@planedrop yes
but the first backup after will be a key ( complete) backup for all the VMs in the job -
@florent OK good to know, thank you! I will do what I can to replicate this issue in my lab and then see if changing to block backups fixes it, just want to try and avoid changing things too much in the prod environment.
Appreciate the help!