Host crash during backup
-
What type of server hardware are you using? Have you performed a test on the RAM to check for issues?
-
@Danp
It's a barebone server from ASRock Rack 1U4LW-X570/2L2T RPSU
CPU : AMD Ryzen 9 5950X
RAM : 4 x 32 GB ECC DDR4I'm not on site right now, I will try a memchek asap to see if there are some errors appearing.
Except for these crashes, the host had been stable with the VMs for two weeks.
-
So, after a night of memtest and a few hours of cpu stress test, no crash or error with the RAM modules.
In the meantime, I moved my XO VM to a different host (outside of the pool I'm backing up) and now everything seems OK, no more crashes.
I still don't have any explication for the host crash, but at least I got the backup working. -
@Mathieu did this ever happen to you again?
I recently started having this happen as well, on a host that has been running rock solid for 6-8 months now. Coincidentally also a Ryzen 9 (a 7900X). The crashes don't always happen, but they are pretty frequent, and in some cases it has caused data corruption.
-
@BrantleyHobbs
No more crashes with backup for a long time, now.
My pool has been upgraded with a second host an XO is running in one of them without any hassle. -
@BrantleyHobbs You may want to provide some additional details on your setup, ie:
- Version of XCP-ng
- Are the host's fully patched?
- Current XO version or commit
- etc
Have you checked /var/crash subdirectory on the crashed host to see if kernel crash logs were captured? https://docs.xcp-ng.org/troubleshooting/log-files/
-
@Danp fully patched 8.3 (through the most recent May patches). XO commit d810e (master commit e3a58). I usually update XO around the first of each month; so it's a little bit behind master.
There are crash logs in /var/crash. I can provide a log bundle if needed, or copy/paste some info here if I know what to provide.
-
Friends, I am once again here to make stupid mistakes so that you don't have to: it appears that the reason my backups were causing the machine to crash/hang is that:
A) I was making DR replicas to local storage (the same device the host is booting from; not the same partition)B) I was filling it up.
Again, this is not a production environment, simply a home lab, and I'm a bit resource poor, and I was looking for something other than my main disk repository for quick recovery. Making use of unused disk space left over on the boot device seemed like a Good Idea at the time.
I added an additional physical disk specifically for DR replication and all my problems stopped.
Hope that helps someone in the future.
-
facepalm situation ! thanks for the feedback

-
@Pilow pretty much
-
O olivierlambert marked this topic as a question
-
O olivierlambert has marked this topic as solved
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login