Runnig VM shows as stopped in wrong pool
-
Strange one for you:
A running VM was (live) migrated from Host 1 (Pool 1) to Host 2 (Pool 2 - each pool only has a single host). The migration failed (too long ago now for me to have the error to hand - but for us it is not an unusual circumstance for live migrations to fail when the disk is > 250gb) and the VM continued to function BUT it shows in XO as not running but present on Host 1 without a disk attached. However in reality it is running on Host 2 - when shutting down Host 1 the VM continues to operate just fine, when shutting down Host 2 it terminates. .
Starting the allegedly stopped VM in XO fails (starting it on Host1 although as above we know it's running on Host2) it shows as it started for about 10 seconds then goes back to stopped - but no error is given.
Migrating it results in the same outcome - it appears to migrate ok but nothing changes and there are no error messages - but remember that XO has it on the wrong host and without an attache disk. Rebooting the VM from its own console direct via SSH comes up just fine, but the issue endures in Xen Orchestra, and rebooting both hosts also fails to resolve the issue.
Is there a way (without damaging the running VM!) to have it re-registered in XO? As an aside, backups using XO are working for the 'stopped' VM except that as it shows in XO as having no attached disk, they take 2s and are worthless.
[Xen Orchestra, commit 00a17 (although will have been updated a few time since the issue presented itself a month or so ago), both hosts on 8.2.1.]
-
Correct - through the CLI one could see that the old VM that had been migrated from Host1 to Host2 was still showing under Host1, whilst the migrated copy was showing as running on Host2. Once the Host1 remnant of the migration was removed that cleared things and XO correctly reported the VM as running on Host2 with its disks attached.
TLDR - There were no other conflicts beyond what appeared through XO to be the only version sitting halted on Host1, but through the CLI one could see the halted copy on Host1 and the running copy on Host2. Somehow the running version did not show in XO until the remnant was removed.
Thanks for your help @olivierlambert
-
I think you might have a duplicated UUID issue. So since XO consider a UUID unique (by definition), if for some reason the VM was duplicated due to a bug, you'll be able to just see one at a time.
First, let's check if you have a duplicated VM UUID: in each host, use
xl list
to list your VM and see if you can spot your VM at both ends. If it's the case, you should probably destroy the bad one (probably paused or something) with axl destroy ID
. This will "kill" the object, but not removing it from XAPI. A toolstack restart on both hosts after that will be also helpful. -
@olivierlambert Thanks for the reply.
Console (
xl list
) on Host1 does not show VM at all (despite XO thinking it's there without any disks but halted), meanwhile the same on Host2 shows it as booted (although XO does not show it at all on that host).Toolstack has been restarted several times at both ends to no avail.
I'm a little confused!
-
I suppose you disconnect/reconnect the pool in Settings/server of Xen Orchestra? (or restart
xo-server
already) -
@olivierlambert Correct
-
This VM is obviously somewhere, do you have any duplicate in
xe vm-list
output?I find it really weird it's not duplicated, you should see it via
xl list
at least Is there any other host? -
It's listed in XO as being halted on Host1 without any disks but (correctly) doesn't show on Host1 under
xe list
, meanwhile it's listed (correctly) in Host2xe list
as running on there.Have tried from XO on different boxes (Host1, Host2 and also on a different host) but all report exactly as above - halted on Host1 with no disks.
Have restarted toolstack on Host1 and Host2, have rebooted Host1 and Host2, have rebooted VM from ssh (daren't do it from within XO in case it doesn't come up, similarly dare not stop it from ssh and then try to start it from XO), have tried migrating within XO but does not work, have tried booting from XO (despite it already running) but it does not work.
In summary - in reality it is running on Host2 just fine;
xe list
is reporting correctly but XO is not. -
Wait, host1 and host2 are on 2 different pools?
-
Correct
-
So just to check, if you disconnect the host 1 pool in Xen Orchestra, do you see the VM appearing suddenly? If it's a duplicate, it should do the trick (if not, leave host 1 disconnected and restart xo-server to be sure, don't forget to force refresh your browser)
-
Ah, you genius. Yes, disconnecting Host1/Pool1 does indeed have the VM magically appear. So there must be a duplicate on Host1 that 'shields' XO from seeing it - only the dup doesn't show in XO. Will check via CLI - I have enough now to find the issue.
Thank you!!
-
That's exactly the issue. So there's something in Host1 still returning the UUID that's also on host2. Find it, remove it and this will solve your problem.
-
Solved.
Thank you - superb deduction skills
-
You still had the XAPI object in host1? Feel free to provide more details so the community can also enjoy the solution
-
Correct - through the CLI one could see that the old VM that had been migrated from Host1 to Host2 was still showing under Host1, whilst the migrated copy was showing as running on Host2. Once the Host1 remnant of the migration was removed that cleared things and XO correctly reported the VM as running on Host2 with its disks attached.
TLDR - There were no other conflicts beyond what appeared through XO to be the only version sitting halted on Host1, but through the CLI one could see the halted copy on Host1 and the running copy on Host2. Somehow the running version did not show in XO until the remnant was removed.
Thanks for your help @olivierlambert
-
-