Problem: XOA becomes unusable when XCP-ng master host is rebooted

Cygace

Hi everyone,

I’m facing an issue in my XCP-ng environment and would really appreciate some advice or best practices.

I have a pool with 3 XCP-ng hosts, and I’m running Xen Orchestra (XOA) as a VM on the second host (not the master). Everything runs fine under normal conditions.

However, if I reboot the master host, I temporarily lose all control of the pool through XOA. Even though XOA is still up and running on the second host, it can’t see any hosts, VMs, or storage — it’s like the entire pool becomes unmanageable until the master comes back online.

My setup:
• 3-host XCP-ng pool
• XOA is a VM on host 2 (not the master)
• When the master is rebooted, XOA loses access to the infrastructure
• When the master is back online, everything works again

My questions:
1. Is this behavior expected in a multi-host pool?
2. Is there a safe way to promote another host to master automatically or manually during planned or unplanned downtime?
3. Would it be better to run XOA outside of the pool, on a dedicated device (e.g., a NUC, VM on another hypervisor)?
4. Are there best practices for ensuring high availability of the management layer (XAPI + XOA)?

Thanks a lot for your help!

olivierlambert

Hi,

This is perfectly normal. Only the master is the contact point for the entire pool.
I would advise to go for manual promotion if you consider your host dead. To do it, go on any slave you want to promote (just one) and type xe pool-emergency-transition-to-master. If you use HA, this will be done automatically, but HA can be tricky.
The result will be the same, if the master is down, you'll lose the view in XO about what's going on the pool, until you get a master back.