XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Async.VM.pool_migrate stuck at 57%

    Scheduled Pinned Locked Moved Unsolved Management
    9 Posts 4 Posters 59 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W Online
      wmazren
      last edited by

      Hi,

      I'm having issue with live migration between xcp-ng host in a pool. The migration looks ok. The VM migrated to Host #1 from Host #2, but the task stuck at 57% (Async.VM.pool_migrate stuck at 57%). I have to restart the toolstack to make the tasks go away. Any idea?

      I'm using XO from source and on the latest commit.

      b35a598f-874f-48c8-80d9-ac6bf587f2dc-image.png

      f23fc220-f692-40b2-a01c-9c98cf5326c3-image.png

      Thank you.

      Best regards,
      Azren

      M 1 Reply Last reply Reply Quote 0
      • W wmazren marked this topic as a question
      • olivierlambertO Online
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        It's likely not an XO problem, but an issue with XCP-ng.

        1. Check your OS is having static RAM settings and enough RAM
        2. Do you have tools installed in your OS?
        3. Time sync between the hosts?
        W 1 Reply Last reply Reply Quote 0
        • W Online
          wmazren @olivierlambert
          last edited by

          @olivierlambert

          1. Check your OS is having static RAM settings and enough RAM

            Yes
            ad12278f-9935-46f2-9de6-600593ad6a53-image.png

          2. Do you have tools installed in your OS?

            Yes
            1eb37966-9726-46e1-ac17-e53fce68ac06-image.png

          3. Time sync between the hosts?

            Yes

          Anything else I can check?

          4acaf12f-c647-451f-bd12-58243faeb621-image.png

          Best regards,
          Azren

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Online
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Do you have the issue with all guests or just this VM?

            W 1 Reply Last reply Reply Quote 0
            • M Offline
              MajorP93 @wmazren
              last edited by MajorP93

              @wmazren I had a similar issue which costed my many hours to troubleshoot.

              I'd advise you to check "dmesg" output within the VM that is not able to get live migrated.

              XCP-ng / Xen behaves different than VMWare regarding live migration.

              XCP-ng will interact with the linux kernel upon live migration and the kernel will try to freeze all processes before performing the live migration.

              In my case a "fuse" process blocked the graceful freezing of all processes and my live migration task also stuck in task view similar to your case.

              After solving the fuse process issue and therefore making the system able to live migrate the issue was gone.

              All of this can be viewed in dmesg as the kernel will tell you about what is being done during live migration via XCP-ng.

              //EDIT: another thing you might want to try is toggling "migration compression" in pool settings as well as making sure you have a dedicated connection / VLAN configured for the live migration. Those 2 things also helped my live migrations being faster and more robust.

              sidS 1 Reply Last reply Reply Quote 1
              • sidS Offline
                sid @MajorP93
                last edited by sid

                I also went troubleshooting and found the same as @MajorP93. Specifically I saw this in the kernel logs (viewable either in dmesg or using journalctl -k) :

                Freezing of tasks failed after 20.005 seconds (1 task refusing to freeze, wq_busy=1)
                

                Quoting askubuntu.com:

                Before going into suspend (or hibernate for that matter), user space processes and (some) kernel threads get frozen. If the freezing fails, it will either be due to a user space process or a kernel thread failing to freeze.

                To freeze a user space process, the kernel sends it a signal that is handled automatically and, once received, cannot be ignored. If, however, the process is in the uninterruptible sleep state (e.g. waiting for I/O that cannot complete due to the device being unavailable), it will not receive the signal straight away. If this delay lasts longer than 20s (=default freeze timeout, see /sys/power/pm_freeze_timeout (in miliseconds)), the freezing will fail.

                NFS, CIFS and FUSE amongst others have been historically known for causing issues like that.

                Also from that post:

                You can grep the problematic task like this # dmesg |grep "task.*pid"

                In my case it was prometheus docker containers.

                W 1 Reply Last reply Reply Quote 1
                • W Online
                  wmazren @olivierlambert
                  last edited by

                  @olivierlambert

                  This happens to other VMs as well.

                  Best regards,
                  Azren

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Online
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    I would check XCP-ng logs to watch what's going on regarding the migration, also making sure you are fully up to date on your 8.3.

                    What kind of hardware do you have?

                    1 Reply Last reply Reply Quote 0
                    • W Online
                      wmazren @sid
                      last edited by

                      @sid

                      My dmesg...

                      8b4484d3-094a-4303-a7a3-551cd423d993-image.png

                      This is the XO VM that I try to migrate, but issue also happen to other VMs running MS WIndows.

                      Best regards,
                      Azren

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post