XCP-NG server crashes/reboots unexpectedly
-
@nvs Machine crashed/restarted itself again this morning. I didn't even have all of the usual VMs running this time. Nothing was logged in kern.log when it crashed again. Before it crashed I checked a few times in the hours before xl dmesg but nothing obvious to me (same log as I posted above). Any suggestions highly welcome as I'm sure how to proceed with troubleshooting this. My next step would be replacing the PSU and see if anything changes, but its a long shot.
-
@nvs said in XCP-NG server crashes/reboots unexpectedly:
@nvs Machine crashed/restarted itself again this morning. I didn't even have all of the usual VMs running this time. Nothing was logged in kern.log when it crashed again. Before it crashed I checked a few times in the hours before xl dmesg but nothing obvious to me (same log as I posted above). Any suggestions highly welcome as I'm sure how to proceed with troubleshooting this. My next step would be replacing the PSU and see if anything changes, but its a long shot.
Ok so it doesn't seem caused by a driver bug causing corruption somewhere in Xen/Linux.
So something is causing Xen to crash, and it's not very easy to know without using e.g the serial console (so you can get the actual Xen crash message).
You need for that something connected and monitoring the machine's serial console (or using IPMI) and boot XCP-ng in "(serial)" mode. -
@TeddyAstie Thanks. Unfortunately my machine doesnt have IPMI. So can I just connect a serial cable between this machine and another machine and monitor the serial output on that other, say windows, machine running putty? Anything special to consider? I never did this before but happy to read up if you maybe have any pointers.
-
@nvs said in XCP-NG server crashes/reboots unexpectedly:
Thanks. Unfortunately my machine doesnt have IPMI. So can I just connect a serial cable between this machine and another machine
Yes though you would still need to boot using the "XCP-ng (Serial)" grub entry.
(you can also add some serial console bits adding them to xen cmdline) -
Update: I replaced the PSU and the server has been running stable now for a few weeks. It appears this was a PSU related issue in the end.
-
Ahh excellent news! Thanks for keeping us posted

-
O olivierlambert marked this topic as a question on
-
O olivierlambert has marked this topic as solved on
-
Good morning,
So I unfortunately had to re-open this issue. I thought this was fixed with a new PSU. However the reboots did show up again some time later. I upgraded XCP-NG meanwhile to the latest stable 8.3.0 and ran all the patches. I was curious if it would still show this behaviour afterwards. I also tested running different VMs on that machine, and after some time I am fairly confident to say now that running one particular VM seems to cause these sudden reboots of the host:
- Its a VM with 2 CPUs, 8 GiB of RAM, running Ubuntu 22.04 with MongoDB. It has a 20 GiB OS partition and, possibly the relevant clue; a 20TB raw disk passed through into the VM (15 TiB partition).
I did a lot of testing the last weeks without that VM (MongoDB) writing to that raw disk, and had no reboots. Yesterday I configured that MongoDB VM to continously add data again to that raw disk and since I've had 2 reboots in just a single day again.
The good news is that I should be able to replicate the issue now. Could someone give me pointers what I could still try to figure out what is exactly going wrong/how I can go about fixing this?
Thanks!
-
To give an impression of how some log files look like. It seems the hard crash glitches the log writing with a lot of NUL bytes and then the next entry is when the machine starts booting up again:

-
@nvs can you try with a serial console and something listening on it; so when it crashes, we get the crash reason ?
-
@TeddyAstie I have to get my hands on a null modem cable first. Will try get that next week and then test. Will get back when I have some results.
I have never tried that before, just to be sure:
Idea is to connect serial port of the xcp-ng server < null modem cable > serial port of another PC, right? And then just i.e. putty on the other machine and the xcp-ng server should send out console text via serial.