Host crash during backup
-
Hello,
It seems that during one of my backup tasks, my host is crashing and rebooting by itself.
Most of the times, the backups are OK, but it happened a few times that the host crashed.
The backup log only says interrupted for the last VM transfer before the crash, nothing more.{ "id": "1709842222597", "message": "snapshot", "start": 1709842222597, "status": "success", "end": 1709842223664, "result": "fab5e091-673d-7088-b09f-014eba74c00f" }, { "data": { "id": "4ec2e44e-1fad-4a5b-b278-370252694c19", "isFull": true, "type": "remote" }, "id": "1709842223713", "message": "export", "start": 1709842223713, "status": "interrupted", "tasks": [ { "id": "1709842224822", "message": "transfer", "start": 1709842224822, "status": "interrupted"
Which other logs (and where are they located) should I investigate to find the cause of that unfortunate reboot?
The remote destination is a NFS share on a NAS.
XCP 8.2.1 latest patches, XO from source (commit 55360)Many thanks for your help.
-
-
-
Hello,
I can definitely link the crash to the backup job, as it occurs a few minutes after I started the job and it happened several times.
That backup was a fresh new job, just a full backup of 8 VMs from a single host to a NFS remote, no delta involved.I've check xensource.log, daemon.log, kernel.log and SMlog, nothing obvious appears to me, but maybe I'm missing something.
If you want to have look, the host crashed at about 09:20:40 and started back at 09:24/var/crash is empty, nothing there.
The pool is for now a single host, nothing fancy on that side.
Should I check other log files?
Many thanks,
-
What type of server hardware are you using? Have you performed a test on the RAM to check for issues?
-
@Danp
It's a barebone server from ASRock Rack 1U4LW-X570/2L2T RPSU
CPU : AMD Ryzen 9 5950X
RAM : 4 x 32 GB ECC DDR4I'm not on site right now, I will try a memchek asap to see if there are some errors appearing.
Except for these crashes, the host had been stable with the VMs for two weeks.
-
So, after a night of memtest and a few hours of cpu stress test, no crash or error with the RAM modules.
In the meantime, I moved my XO VM to a different host (outside of the pool I'm backing up) and now everything seems OK, no more crashes.
I still don't have any explication for the host crash, but at least I got the backup working.