Help: Clean shutdown of Host, now no network or VMs are detected
-
Good advice.
Occasionally our older servers crap themselves with this same issue and an emergency network reset on the pool master and then rebooting the other hosts once the pool master is back up usually resolves the issue.
-
Thanks for the input. Good to know I might need to do the reset on the pool master, not just the host that's impacted. Fortunately, the impacted host is not too big a deal for me to reboot. The pool master will be more annoying since I don't use shared storage and my VMs are big enough that I'd rather not migrate them. Not a big deal to do it some evening or over a weekend.
-
@CodeMercenary Try an emergency network reset on the server that is directly having the issue and if that doesn't work, try reseting the toolstack on the pool master and then try a network reset there as well.
Recommend that you set up two hosts with shared storage between the two so that you can live-migrate your VM's to the other host and elect it as master temporarily when you do maintenance.
-
@guiltykeyboard Thank you for the guidance. I'll try that. I still think it's super weird that the server did get back into the pool and seems to be working fine. It's just that the physical console still says it can't find the pool master and has no NICs installed.
The main reason I don't have shared storage is due to concerns over cost. Years ago, I was thinking of setting up a SAN for my VMware servers and they were crazy expensive, way over the budget of the small company I work for. I think I stopped researching when I ran across an article that was titled something like "How to build your own SAN for less than $10,000" and I realized I was way out of my league.
I do have a bigger budget now though I would not be able to spend $10k to build shared storage. Any recommendations of ways to do reliable shared storage without it being crazy expensive? One thing that helps now is that each of my servers has dual 10Gbe ethernet, something I didn't have the last time I looked into this.
I've been keeping my eye on XOSTOR lately since I have storage in all of my servers. Unfortunately, from what I've seen, it requires three servers in the cluster and two of my servers have SSDs and the other has only HDDs so I suspect that third one would slow down the other two since a write isn't considered complete until all servers are done writing. XOSTORE feels safer than shared storage since shared storage would be a single point of failure. (Admittedly, right now I also have multiple single points of failure since I use local storage).
-
@CodeMercenary Have you restarted the toolstack now that it is doing what it needs to do?
That might make the console show what it needs.
Also try exiting out of the console and then starting it back up with xsconsole.
-
@guiltykeyboard Restarting the toolstack didn't fix it.
When I use ctrl-C to exit the console it says it's resetting the console and the new instance still says it can't reach the pool master.
Looking like the network reset on the troublesome host is the next step, then maybe a reboot. Then I can try your suggestions on the pool master. Just a little gun-shy with messing with the master because of this happening to a server that should have just worked.
-
@CodeMercenary Try the toolstack reset on that host.
Maybe do a toolstack reset on the pool master as well - which doesn't affect running VM's.
If that doesn't work, try an emergency network reset on the host having trouble.
-
@guiltykeyboard Restarting the toolstack on the pool master did not fix it so I went ahead and did the emergency network reset on the host having the issue. It came up normally after the reboot. The emergency reset was super easy, Thank you for helping me with this. Still learning XCP-ng and trying to understand what's happening before trying things to fix it, so in the future I'll know more and hopefully be able to help other people.
-
@CodeMercenary Glad it worked out for you.
-
Another bit of strangeness. I just noticed that some older backup disaster recovery VMs were booted up on my pool master host. I looked at the stats and they all booted 4 hours ago, right around when I tried restarting the toolstack on the pool master. All of them were set to auto-start, an odd setting I think for disaster recovery VMs unless there is supposed to be something in place to stop them from auto-starting. Easy enough to shut down but kinda strange that they booted. Surely disaster recovery VMs aren't supposed to power up on restart, right?
-
They shouldn't yes. Were you using DR or CR?
-
-
-
@olivierlambert It was DR. I was testing DR a while ago and after running it once I disabled the backup job so these backups have just been sitting on the server. I don't think I've rebooted that server since running that backup.
-