XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Runnig VM shows as stopped in wrong pool

    Scheduled Pinned Locked Moved Solved Management
    15 Posts 2 Posters 571 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      shorian
      last edited by

      Strange one for you:

      A running VM was (live) migrated from Host 1 (Pool 1) to Host 2 (Pool 2 - each pool only has a single host). The migration failed (too long ago now for me to have the error to hand - but for us it is not an unusual circumstance for live migrations to fail when the disk is > 250gb) and the VM continued to function BUT it shows in XO as not running but present on Host 1 without a disk attached. However in reality it is running on Host 2 - when shutting down Host 1 the VM continues to operate just fine, when shutting down Host 2 it terminates. .

      Starting the allegedly stopped VM in XO fails (starting it on Host1 although as above we know it's running on Host2) it shows as it started for about 10 seconds then goes back to stopped - but no error is given.

      Migrating it results in the same outcome - it appears to migrate ok but nothing changes and there are no error messages - but remember that XO has it on the wrong host and without an attache disk. Rebooting the VM from its own console direct via SSH comes up just fine, but the issue endures in Xen Orchestra, and rebooting both hosts also fails to resolve the issue.

      Is there a way (without damaging the running VM!) to have it re-registered in XO? As an aside, backups using XO are working for the 'stopped' VM except that as it shows in XO as having no attached disk, they take 2s and are worthless.

      [Xen Orchestra, commit 00a17 (although will have been updated a few time since the issue presented itself a month or so ago), both hosts on 8.2.1.]

      1 Reply Last reply Reply Quote 0
      • S Offline
        shorian @olivierlambert
        last edited by

        Correct - through the CLI one could see that the old VM that had been migrated from Host1 to Host2 was still showing under Host1, whilst the migrated copy was showing as running on Host2. Once the Host1 remnant of the migration was removed that cleared things and XO correctly reported the VM as running on Host2 with its disks attached.

        TLDR - There were no other conflicts beyond what appeared through XO to be the only version sitting halted on Host1, but through the CLI one could see the halted copy on Host1 and the running copy on Host2. Somehow the running version did not show in XO until the remnant was removed.

        Thanks for your help @olivierlambert

        1 Reply Last reply Reply Quote 1
        • olivierlambertO Online
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          I think you might have a duplicated UUID issue. So since XO consider a UUID unique (by definition), if for some reason the VM was duplicated due to a bug, you'll be able to just see one at a time.

          First, let's check if you have a duplicated VM UUID: in each host, use xl list to list your VM and see if you can spot your VM at both ends. If it's the case, you should probably destroy the bad one (probably paused or something) with a xl destroy ID. This will "kill" the object, but not removing it from XAPI. A toolstack restart on both hosts after that will be also helpful.

          S 1 Reply Last reply Reply Quote 0
          • S Offline
            shorian @olivierlambert
            last edited by shorian

            @olivierlambert Thanks for the reply.

            Console (xl list) on Host1 does not show VM at all (despite XO thinking it's there without any disks but halted), meanwhile the same on Host2 shows it as booted (although XO does not show it at all on that host).

            Toolstack has been restarted several times at both ends to no avail.

            I'm a little confused!

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Online
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              I suppose you disconnect/reconnect the pool in Settings/server of Xen Orchestra? (or restart xo-server already)

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                shorian @olivierlambert
                last edited by

                @olivierlambert Correct

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Online
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  This VM is obviously somewhere, do you have any duplicate in xe vm-list output?

                  I find it really weird it's not duplicated, you should see it via xl list at least 🤔 Is there any other host?

                  S 1 Reply Last reply Reply Quote 0
                  • S Offline
                    shorian @olivierlambert
                    last edited by

                    It's listed in XO as being halted on Host1 without any disks but (correctly) doesn't show on Host1 under xe list, meanwhile it's listed (correctly) in Host2 xe list as running on there.

                    Have tried from XO on different boxes (Host1, Host2 and also on a different host) but all report exactly as above - halted on Host1 with no disks.

                    Have restarted toolstack on Host1 and Host2, have rebooted Host1 and Host2, have rebooted VM from ssh (daren't do it from within XO in case it doesn't come up, similarly dare not stop it from ssh and then try to start it from XO), have tried migrating within XO but does not work, have tried booting from XO (despite it already running) but it does not work.

                    In summary - in reality it is running on Host2 just fine; xe list is reporting correctly but XO is not.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Online
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Wait, host1 and host2 are on 2 different pools?

                      S 1 Reply Last reply Reply Quote 0
                      • S Offline
                        shorian @olivierlambert
                        last edited by

                        Correct

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Online
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by olivierlambert

                          So just to check, if you disconnect the host 1 pool in Xen Orchestra, do you see the VM appearing suddenly? If it's a duplicate, it should do the trick (if not, leave host 1 disconnected and restart xo-server to be sure, don't forget to force refresh your browser)

                          S 1 Reply Last reply Reply Quote 1
                          • S Offline
                            shorian @olivierlambert
                            last edited by

                            Ah, you genius. Yes, disconnecting Host1/Pool1 does indeed have the VM magically appear. So there must be a duplicate on Host1 that 'shields' XO from seeing it - only the dup doesn't show in XO. Will check via CLI - I have enough now to find the issue.

                            Thank you!!

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Online
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by olivierlambert

                              That's exactly the issue. So there's something in Host1 still returning the UUID that's also on host2. Find it, remove it and this will solve your problem.

                              S 1 Reply Last reply Reply Quote 1
                              • S Offline
                                shorian @olivierlambert
                                last edited by shorian

                                Solved.

                                Thank you - superb deduction skills 🙂

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO Online
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  You still had the XAPI object in host1? Feel free to provide more details so the community can also enjoy the solution 🙂

                                  S 1 Reply Last reply Reply Quote 0
                                  • S Offline
                                    shorian @olivierlambert
                                    last edited by

                                    Correct - through the CLI one could see that the old VM that had been migrated from Host1 to Host2 was still showing under Host1, whilst the migrated copy was showing as running on Host2. Once the Host1 remnant of the migration was removed that cleared things and XO correctly reported the VM as running on Host2 with its disks attached.

                                    TLDR - There were no other conflicts beyond what appeared through XO to be the only version sitting halted on Host1, but through the CLI one could see the halted copy on Host1 and the running copy on Host2. Somehow the running version did not show in XO until the remnant was removed.

                                    Thanks for your help @olivierlambert

                                    1 Reply Last reply Reply Quote 1
                                    • olivierlambertO olivierlambert marked this topic as a question on
                                    • olivierlambertO olivierlambert has marked this topic as solved on
                                    • First post
                                      Last post