the writer IncrementalRemoteWriter has failed the step writer.beforeBackup() with error Lock file is already being held. It won't be used anymore in this job execution.

manilx

@florent No other backup is running as I mentioned!

And "Merge backups synchronously" is on as mentioned.

This is not it.....

florent

@manilx are there multiple job running on the same VM ?

manilx

@florent Only one single backup job every 2hrs (multiple VM's). Running fore more than 1year without issues (not counting the initianl cbt stuff).

ScreenShot 2025-02-14 at 13.43.59.png

ScreenShot 2025-02-14 at 13.45.27.png

manilx

@florent This make no sense. There has to be some leftover sometimes of a lock file.
ScreenShot 2025-02-15 at 09.43.16.png

SudoOracle

So I started getting this a few days ago as well. My backups have been fine for over a year until now, they are constantly failing at random. They will usually succeed just fine if I restart the failed backup. I am currently on commit 494ea. I haven't had time to delve into this though. I just found this post when initially googling.

olivierlambert

That's interesting feedback, maybe a recently introduced bug. Could you try to go on an older commit to see if you can identify the culprit?

manilx

@SudoOracle Thx for chiming in!

Glad to know I'm not alone.

SudoOracle

@olivierlambert I went back to my prior version, which was commit 04dd9. And the very next backup finished successfully. Not sure if this was a fluke, but I have backups going every 2 hours and this was the first to finish successfully today without me clicking the button to retry the backup. I'll report tomorrow on whether they continue to work or not.

olivierlambert

Okay great, you can use git bisect to easily find the culprit. Now we also need to find the common situation between you and @manilx : type of backup, advanced options enabled, type of BR, and so on to also understand more about this.

manilx

@olivierlambert I have changed my backup strategy because of that:

Before:
ScreenShot 2025-02-17 at 10.41.02.png

Now:
ScreenShot 2025-02-17 at 10.41.58.png

After this the problem did no longer occur.

olivierlambert

Can you sum up what you did? I admit I don't have enough time to investigate differences between 2 screenshots

manilx

@olivierlambert Sure!

I had a main backup job running 2 schedules, on bihourly and one last one at the end of the day, the latter one with healthcheck.

Then I had 2 Mirror incremental ones running after that, one to a local NAS and one to a remote NAS @office.

I now have separate backups for the bihourly, end of day and the 2 mirror ones (now they are delta backups directly to the NAS'es).

I hope this explains it.

olivierlambert

Thanks. As soon the commit generating the error is identified, this will be very helpful for @florent

manilx

@olivierlambert I can then restore the config and retest once it's identified/fixed.... But just now I need the backups working without issues

SudoOracle

So since I went back to an older version, I have not had a single backup fail. I have 3 main backups that run. 2 of them daily (my pool metadata/XO config and a delta of 18vms) and then 1 every 2 hours (also a delta but only of 3 vms). ALL of them would fail in one way or another, even the metadata. Something I just noticed now is it looks like each backup is starting twice or at least there are failed logs indicating so. If they are actually running twice that would explain the errors. This would also explain why clicking the retry backup option would always succeed.

Error from the metadata backup:

EEXIST: file already exists, open '/run/xo-server/mounts/c9199dfc-af05-4707-badb-8741e61daafb/xo-config-backups/f19069dd-f98b-4b41-9ca8-0e711fc75968/20250216T041500Z/data.json'

olivierlambert

@SudoOracle are you able to find the offending commit?

SudoOracle

@olivierlambert Still working on it. It's between a4986 and 494ea.

manilx

@olivierlambert Just to confirm that the error have stopped since I changed the backup plans.

The change being to have only one schedule per job and not using Replicate jobs.

SudoOracle

Ok so I'm not sure what happened but it's working fine now, I made it all the way back to commit 494ea and my backups continue to work. I have done nothing except move between commits. Maybe something happened when updating before? I am going to move to the latest commit and see if it continues to work.

florent

@SudoOracle could you post the full json log of the backup ?
(you can get it by clicking on the download button on top of a failed execution job)

if possible, one per type of failed backup job