Risks of running a shared pool?
-
Our XCP's are all independent pools. This worked well as all storage was effectively local, but we've now setup truenas and want to start hosting some non-critical VM's on a shared NAS.
I have only worked with a "shared pool" once before in Xencenter days, and recall rebooting the master and having a small heart attack seeing that all of my hosts in the pool not displaying as expected (I cant recall if these servers "went offline / maintenance" mode but do recall immediately undoing the pooling afterwards).
So before I do this again, my understanding is that we have a pool master and then add hypervisors to that pool, and they can then share resources.
I have now elected 2 hypervisors to start my pooling experience. I have renamed the pool name of what will be my "master".
- What happens when I add a hypervisors to the pool, will any current VM's running be affected, does it go into maintenance mode or is it simply add the host to the pool and all works as expected.
- What is the correct way to perform maintenance on such a pool? Should I perform maintnenace on slaves first, then promote a new master then perform maintenance on the "old master"?
- What happens if the master goes offline due to possible hardware issues, are the other hypervisors at all affected? Does it affect running VM's on those hosts? In XO, will I still be able to view / manage the other hypervisors or is there tasks that would be required to elect a new master?
- Lastly, if I have shared VM's in the NAS, and the hypervisor goes down, will I be able to automatically start the VM on one of the other hypervisors (not HA, manual start)? I am not sure where the metadata of the VM's are located so not sure if the VM's can outright just be started on another hypervisor in the pool. I assume there is a possibility that the VM might still be running (just not known to XO at the time) so how do we overcome accidentally starting the VM on another host if such an incident occurs?
-
Hi,
- Nothing will happen to your VMs when you add another host to your current pool.
- Depends on what you call maintenance. For updates, it's always master first (note: this is handled automatically by Xen Orchestra and its rolling pool update feature).
- If the master goes down (a reboot), then you can't control your pool until it's back. But this won't affect your running VMs. They'll still run without any problem. If the master is down for good, then you can elect a new master yourself. In HA, this is done automatically, but you always need to disable HA for updates anyway (since the master must be rebooted first). In short, it's OK to reboot your master, you'll just won't have control on new actions on your pool during that time, without any impact on what's going on.
- If it's a slave going down, no problem to start the VM on another host. If it's the master, you'll have to elect a new master manually. HA will do it for you. As long as you don't elect a new master (ONLY if you lost the master for good), you won't be able to boot any new VM anyway.
-
@olivierlambert you are by far the most responsive product founder I have ever come accross, thank you!
I don't intend on doing HA (I have read your article about the additional potential pitfalls with HA and would rather be safe as I had a split brain incident using HALizard a couple of years ago that has scared me for life). If HA is disabled, and I have a VM running on a shared storage in a pool, and the host on which that VM is running goes offline, will I be able to start that VM / get it running on another host in the same pool? Is the VM metadata stored on the shared SR or only on the host on which the VM is running?
So the takeaway here is that if the master goes offline during a reboot the pool will be inaccessible but the VM's will remain running. Once the master comes online the host accessibility is returned. IF the master remains offline, I will need to SSH into the master, xsconsole and then promote the new host as master under Resource Pool Configuration.
Interesting question, when rolling updates is done with XO, and the storage is all network / NAS, does XO handle the migration of the VM to other hosts automatically?
-
@olivierlambert whilst searching how to promote to a new master, I came accross this article where the person says he ran into a problem having a CR VM in the same shared storage. I have a failover host (where we do CR of a handful of critical vm's), should I avoid adding this host to the shared pool or is this guy simply confused?
https://forums.lawrencesystems.com/t/xcp-ng-continuous-replication-in-same-pool-mistake/14195
-
In case of HA disabled and a secondary host dies: you can start any VM on any available host. It will take a bit to XAPI to consider the host dead (few minutes IIRC). If it's the master, just wait for the master to be back or elect a new one, and start the VM.
The VM metadata is stored in XAPI database, which is a read/write DB on the pool master, replicated in read only in each host (it's not stored in any SR).
For Rolling pool update: yes, it's fully automated. We disable HA (if any), migrate all VMs from master to slaves, update, reboot master, wait for coming back, and then move again VMs which belong to master before the migration. And so on for each slave.
-
About CR: it's unrelated. The guy discovered he couldn't boot anything while the master is down, which is normal. If you want to make any operation, you must have a master available (promote it for example).
-
@olivierlambert I tried adding a host to a new pool but got the following error (self explanatory) but it seems that adding a host to a different pool all vm's must first be shutdown. Just leaving this if someone else "googles" this topic
JOINING_HOST_CANNOT_HAVE_RUNNING_VMS() This is a XenServer/XCP-ng error
-
You can't add a host into another pool until it's "blank". It's hard to "merge" two pools together.
So first, migrate VMs to the destination, clean the host and THEN add it to the pool.