XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Async.VM.pool_migrate stuck at 57%

    Scheduled Pinned Locked Moved Unsolved Management
    14 Posts 4 Posters 85 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W Offline
      wmazren @olivierlambert
      last edited by

      @olivierlambert

      1. Check your OS is having static RAM settings and enough RAM

        Yes
        ad12278f-9935-46f2-9de6-600593ad6a53-image.png

      2. Do you have tools installed in your OS?

        Yes
        1eb37966-9726-46e1-ac17-e53fce68ac06-image.png

      3. Time sync between the hosts?

        Yes

      Anything else I can check?

      4acaf12f-c647-451f-bd12-58243faeb621-image.png

      Best regards,
      Azren

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Do you have the issue with all guests or just this VM?

        W 1 Reply Last reply Reply Quote 0
        • M Offline
          MajorP93 @wmazren
          last edited by MajorP93

          @wmazren I had a similar issue which costed my many hours to troubleshoot.

          I'd advise you to check "dmesg" output within the VM that is not able to get live migrated.

          XCP-ng / Xen behaves different than VMWare regarding live migration.

          XCP-ng will interact with the linux kernel upon live migration and the kernel will try to freeze all processes before performing the live migration.

          In my case a "fuse" process blocked the graceful freezing of all processes and my live migration task also stuck in task view similar to your case.

          After solving the fuse process issue and therefore making the system able to live migrate the issue was gone.

          All of this can be viewed in dmesg as the kernel will tell you about what is being done during live migration via XCP-ng.

          //EDIT: another thing you might want to try is toggling "migration compression" in pool settings as well as making sure you have a dedicated connection / VLAN configured for the live migration. Those 2 things also helped my live migrations being faster and more robust.

          sidS 1 Reply Last reply Reply Quote 1
          • sidS Offline
            sid @MajorP93
            last edited by sid

            I also went troubleshooting and found the same as @MajorP93. Specifically I saw this in the kernel logs (viewable either in dmesg or using journalctl -k) :

            Freezing of tasks failed after 20.005 seconds (1 task refusing to freeze, wq_busy=1)
            

            Quoting askubuntu.com:

            Before going into suspend (or hibernate for that matter), user space processes and (some) kernel threads get frozen. If the freezing fails, it will either be due to a user space process or a kernel thread failing to freeze.

            To freeze a user space process, the kernel sends it a signal that is handled automatically and, once received, cannot be ignored. If, however, the process is in the uninterruptible sleep state (e.g. waiting for I/O that cannot complete due to the device being unavailable), it will not receive the signal straight away. If this delay lasts longer than 20s (=default freeze timeout, see /sys/power/pm_freeze_timeout (in miliseconds)), the freezing will fail.

            NFS, CIFS and FUSE amongst others have been historically known for causing issues like that.

            Also from that post:

            You can grep the problematic task like this # dmesg |grep "task.*pid"

            In my case it was prometheus docker containers.

            W 1 Reply Last reply Reply Quote 1
            • W Offline
              wmazren @olivierlambert
              last edited by

              @olivierlambert

              This happens to other VMs as well.

              Best regards,
              Azren

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                I would check XCP-ng logs to watch what's going on regarding the migration, also making sure you are fully up to date on your 8.3.

                What kind of hardware do you have?

                W 1 Reply Last reply Reply Quote 0
                • W Offline
                  wmazren @sid
                  last edited by

                  @sid

                  My dmesg...

                  8b4484d3-094a-4303-a7a3-551cd423d993-image.png

                  This is the XO VM that I try to migrate, but issue also happen to other VMs running MS WIndows.

                  Best regards,
                  Azren

                  1 Reply Last reply Reply Quote 0
                  • W Offline
                    wmazren @olivierlambert
                    last edited by

                    @olivierlambert

                    Both hosts are Dell PowerEdge R760 dual processor with 512GB of memory. Missing this month patches. I'm trying to live migrate VMs to 1 host so that I can start installing patches and reboot.

                    Host#1

                    5407bbff-e84c-4e62-b2cb-bc80ef5565a0-image.png

                    Host#2

                    b707b1cb-44e7-44f1-bf43-16c6776af0b9-image.png

                    Host#1: dmesg

                    6f9faa0c-e1be-4f4e-bf5b-1bed6a62211b-image.png

                    Host#2: dmesg

                    813113f7-f8c1-434f-a942-f7ea3818417c-image.png

                    W 1 Reply Last reply Reply Quote 0
                    • W Offline
                      wmazren @wmazren
                      last edited by

                      It appears that the issue is related to Host #1. Any migration into or out of Host #1 tends to cause problems. Occasionally, virtual machines (VMs) lose network connectivity during migration and become unresponsive — they cannot be shut down, powered off (even forcefully), or restarted, often getting stuck in the process.

                      I’ve added Host #3 to the pool. Migration between Host #2 and Host #3 works smoothly in both directions.

                      Any idea how can I kill the stuck VM?

                      xe vm-reset-powerstate force=true vm=MYVM03
                      This operation cannot be completed because the server is still live.
                      host: cb8311e8-d0fd-4d53-be99-fe3fea2c9351 (HOST01)
                      

                      Best regards,
                      Azren

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Is it the pool master?

                        W 1 Reply Last reply Reply Quote 0
                        • W Offline
                          wmazren @olivierlambert
                          last edited by

                          @olivierlambert

                          I've already moved the pool master from host #1 to host # 2

                          Best regards,
                          Azren

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates 🪐 Co-Founder CEO
                            last edited by

                            Then reboot than broken host and in the meantime, re-issue the power reset command from the master.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post