Hosts in a pool have gone offline after reboot
-
@Aeoran Yes, that sounds like a good plan.
-
@Danp Is there some documentation you would recommend on how to safely forget a host? I'm confronted with dire warnings on how this will permanently destroy the SRs used by the VMs that used to run on the dead host. So, I want to make really sure I won't be doing something wrong here.
Thanks!
-
Shared storage should belong to the pool, only local SR should be affected when you forget the old master.
Just make sure all the slaves know about the new master before doing anything to the old one. -
@nikade It looks like I cannot get the dead host to rejoin the pool using
xe pool-join
:You attempted an operation that was not allowed. reason: Host is already part of a pool
Will I have problems if I try to force it to join with
xe pool-join force
? A forum post seems to suggest that this may propagate data corruption errors from the dead host to the pool, which is obviously undesireable. So how would I avoid that? -
Not really sure, I'd ask @olivierlambert to be sure.
-
What actions did you initially perform to remove the host from the pool?
-
@Danp I didn't do anything. The master host failed on its own and stopped responding to XO.
I've rebooted the host and the hardware all seems fine. The logs suggest that XAPI is not running because the database is missing a column (see above, first comment).
-
You probably need to forget the host using
xe host-forget uuid=UUID
where UUID belongs to the old pool master.See prior discussion on this topic -- https://xcp-ng.org/forum/topic/6164/remove-a-host-from-a-pool/14
-
@Danp How can I preserve or recover the local SRs of the dead host?
-
@Aeoran AFAIK, the XAPI database gets wiped whenever you add or remove the host from a pool. You may be able to restore metadata to the old master once it is no longer belongs to the pool, but I can't guarantee that this will work or not produce other issues.
If you don't have backups of the VMs, then you should be able to copy the VHD files to another location by accessing the directory
/run/sr-mount/<SR UUID>/
on the old master.