Troubleshooting Backups (in general)
-
Hi, i want to learn how i can troubleshoot backup job problems.
On the event of an error happening, In most cases the job status gets set to failed and i have an error message which i can then trace and resolve.
But occasionally this does not happen, like in the following example:
I have a job which runs one time in a week full backup and the other days delta.
This job started a full backup on Dec 31. 22 on 5:00 AM. It was still in the state "started" 24 hours later.
-
there was no visible activity anymore (no tasks, no traffic)
-
3VMs were backed up successfully
-
the timeout of this job was set to 23 hours (so it should have been killed allready?)
Because of this job beeing stuck in "Started" the following days fail with "job allready running"
On Jan 3 I restartet xo-server service and then the job was set to "interrupted" without an end time.
The delta backup on Jan 4. started as planed, but is stuck again.
I would probably be able to reconfigure the job, and it would be ok, but since this happens sometimes i would like to understand what happens.
Where i can get aditional information?
Why is it, at least after the configured timeout, not set to failed?BTW: I am running source with Server 5.107.5 and Web 5.109.0, but i had such things happening in earlier versions too.
-
-
Hi,
As usual, XOA or XO from the sources?
-
@olivierlambert its XO, as i wrote: with Server 5.107.5 and Web 5.109.0,
-
It's either XOA or XO form the sources
If XO from the sources, please read this first: https://xen-orchestra.com/docs/community.html#current-version
xo-web or xo-server version doesn't matter (since they are all in the same repo since years already). What matters is your commit number
-
@olivierlambert OK. Sorry Xen Orchestra, commit 17027 installed on Debian 10.
Just to add: I think in general it`s not something thats just happening at this version. I have seen such things happening for a few years on different systems.
Until now, i lived with it (quite well)
Just trying to explore where i can find aditional information and try to improve my understanding of whats happening under the hood.
-
- Just try to keep up on latest commit from
master
before anything else - Also check your NodeJS version, and rebuild if it's too old (it should be Node 18 now)
For the rest, it's hard to answer without digging more.
- Just try to keep up on latest commit from
-
@olivierlambert Yes, i will update and have a look if it changes the behaviour anyhow.
"For the rest, it's hard to answer without digging more."
That`t exactly was i was looking for: Any Information where i can dig deeper?
I`m looking for logs or traces of errors or so.
It is not that i just wan`t this particular problem to be solved
In my limited understanding of interal processes i think somehow a backup process dies and xo-server does not report or recognize this.
But shouldn`t it somehow?