Help: Clean shutdown of Host, now no network or VMs are detected
-
@CodeMercenary Have you restarted the toolstack now that it is doing what it needs to do?
That might make the console show what it needs.
Also try exiting out of the console and then starting it back up with xsconsole.
-
@guiltykeyboard Restarting the toolstack didn't fix it.
When I use ctrl-C to exit the console it says it's resetting the console and the new instance still says it can't reach the pool master.
Looking like the network reset on the troublesome host is the next step, then maybe a reboot. Then I can try your suggestions on the pool master. Just a little gun-shy with messing with the master because of this happening to a server that should have just worked.
-
@CodeMercenary Try the toolstack reset on that host.
Maybe do a toolstack reset on the pool master as well - which doesn't affect running VM's.
If that doesn't work, try an emergency network reset on the host having trouble.
-
@guiltykeyboard Restarting the toolstack on the pool master did not fix it so I went ahead and did the emergency network reset on the host having the issue. It came up normally after the reboot. The emergency reset was super easy, Thank you for helping me with this. Still learning XCP-ng and trying to understand what's happening before trying things to fix it, so in the future I'll know more and hopefully be able to help other people.
-
@CodeMercenary Glad it worked out for you.
-
Another bit of strangeness. I just noticed that some older backup disaster recovery VMs were booted up on my pool master host. I looked at the stats and they all booted 4 hours ago, right around when I tried restarting the toolstack on the pool master. All of them were set to auto-start, an odd setting I think for disaster recovery VMs unless there is supposed to be something in place to stop them from auto-starting. Easy enough to shut down but kinda strange that they booted. Surely disaster recovery VMs aren't supposed to power up on restart, right?
-
They shouldn't yes. Were you using DR or CR?
-
O olivierlambert marked this topic as a question on
-
O olivierlambert has marked this topic as solved on
-
@olivierlambert It was DR. I was testing DR a while ago and after running it once I disabled the backup job so these backups have just been sitting on the server. I don't think I've rebooted that server since running that backup.
-
N nick.lloyd referenced this topic on
-
Just want to document that this happened again, on the same host.
My XO (from source) that manages my backups, ran the backups last night. I know it was active up until at least around 5:30am but by the time I got into the office it was inaccessible by browser, ssh and ping. Other XO instances showed that it was running but the Console tab didn't give me access to its console.
A few hours later I found that other VMs on that same host had become inaccessible in the same fashion and also had no console showing in XO.
An hour or two later I found that XO showed the host as being missing from the pool, which it had not been earlier in the day. When I checked the physical console for that host, I found it had red text "<hostname> login: root (automatic login)" and did not respond to the keyboard other than putting more red text on screen of whatever I typed. I hit ctrl-alt-del and it didn't seem to do anything so I typed random things, then XCP-ng started rebooting. I'm guessing it was 60 to 120 seconds after my first ctrl-alt-delete.
When it came back up it could not find the pool master and said it had no network interfaces. I was able to solve it by doing another emergency network reset.
Would be nice if this wouldn't happen, makes me super nervous about stability. Thankful that the two times this has happened, it was on my least mission critical server. However, it's the server that handles backups so it's still stressful to have it go down. Also makes me wonder if there might be something wrong with that server's hardware.
-
You need to check the logs because obviously it's not normal. Check if you have a /var/crash folder with some stuff in it. Otherwise, the usual logs to check (on both xen and dom0 side).
-
@olivierlambert I do have a /var/crash folder but it has nothing it in except a file from a year ago named
.sacrificial-space-for-logs
.The Xen logs are verbose. Any suggestions of text to grep to find what I'm looking for? Other than the obvious
error
I should search for.Currently looking in xensource.log.* for error lines to see if I can figure anything out.