XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Live Migration and Warm Migration both fail

    Scheduled Pinned Locked Moved Management
    2 Posts 2 Posters 81 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      anik
      last edited by

      Hi,

      i tried running Rolling pool update for my servers today, first host was correctly updated and rebooted, but the process got stuck on 2nd host.

      After investigating, i found that one of my VM's was not being migrated from the host, which is why the update process failed.

      Initial Rolling pool update error was this:

      pool.rollingUpdate
      {
        "pool": "fe688bb2-b9ac-db7b-737a-cc457195f095"
      }
      {
        "code": "VM_SUSPEND_TIMEOUT",
        "params": [
          "OpaqueRef:5c1818ff-cb37-4103-993a-4b80fa8c8231",
          "1200."
        ],
        "task": {
          "uuid": "4547170a-ec23-4ad7-128c-69958985de34
      
      ",
          "name_label": "Async.host.evacuate",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20250210T12:37:20Z",
          "finished": "20250210T12:58:42Z",
          "status": "failure",
          "resident_on": "OpaqueRef:010eebba-be27-489f-9f87-d06c8b675f19",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "VM_SUSPEND_TIMEOUT",
            "OpaqueRef:5c1818ff-cb37-4103-993a-4b80fa8c8231",
            "1200."
          ],
          "other_config": {},
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename ocaml/xapi-client/client.ml)(line 7))((process xapi)(filename ocaml/xapi-client/client.ml)(line 19))((process xapi)(filename ocaml/xapi-client/client.ml)(line 6172))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/xapi_host.ml)(line 612))((process xapi)(filename ocaml/xapi/xapi_host.ml)(line 621))((process xapi)(filename hashtbl.ml)(line 266))((process xapi)(filename hashtbl.ml)(line 272))((process xapi)(filename hashtbl.ml)(line 277))((process xapi)(filename ocaml/xapi/xapi_host.ml)(line 629))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
        },
        "message": "VM_SUSPEND_TIMEOUT(OpaqueRef:5c1818ff-cb37-4103-993a-4b80fa8c8231, 1200.)",
        "name": "XapiError",
        "stack": "XapiError: VM_SUSPEND_TIMEOUT(OpaqueRef:5c1818ff-cb37-4103-993a-4b80fa8c8231, 1200.)
          at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/_XapiError.mjs:16:12)
          at default (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/_getTaskResult.mjs:13:29)
          at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1041:24)
          at file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1075:14
          at Array.forEach (<anonymous>)
          at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1065:12)
          at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1238:14)"
      }
      

      I then tried manually migrating which failed:

      vm.migrate
      {
        "vm": "0c012493-da75-832d-3f3a-cadce8afb757",
        "migrationNetwork": "1f6f4495-1045-6fe2-3da6-4e43862e623d",
        "sr": "6b24cd1c-22ad-0994-5b6b-a75389a6ddba",
        "targetHost": "48bf1075-066f-4ed1-ba54-a350da4a426c"
      }
      {
        "code": "INTERNAL_ERROR",
        "params": [
          "Storage_error ([S(Internal_error);S(Xmlrpc_client.Connection_reset)])"
        ],
        "task": {
          "uuid": "92d36644-b919-fdf3-e879-475be5b4d8c9",
          "name_label": "Async.VM.migrate_send",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20250210T13:40:48Z",
          "finished": "20250210T13:40:54Z",
          "status": "failure",
          "resident_on": "OpaqueRef:010eebba-be27-489f-9f87-d06c8b675f19",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "INTERNAL_ERROR",
            "Storage_error ([S(Internal_error);S(Xmlrpc_client.Connection_reset)])"
          ],
          "other_config": {},
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename ocaml/xapi/helpers.ml)(line 1690))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 134))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
        },
        "message": "INTERNAL_ERROR(Storage_error ([S(Internal_error);S(Xmlrpc_client.Connection_reset)]))",
        "name": "XapiError",
        "stack": "XapiError: INTERNAL_ERROR(Storage_error ([S(Internal_error);S(Xmlrpc_client.Connection_reset)]))
          at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/_XapiError.mjs:16:12)
          at default (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/_getTaskResult.mjs:13:29)
          at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1041:24)
          at file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1075:14
          at Array.forEach (<anonymous>)
          at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1065:12)
          at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202410070635/packages/xen-api/index.mjs:1238:14)"
      }
      

      I also tried Warm migration, that job was just "Interrupted" found no log for it.

      After little digging i found that issue might actually be in the VM that was being migrated, i found this in its kern.log:

      Feb 10 13:58:34 lapio kernel: [9681908.582545] Freezing user space processes ...
      Feb 10 13:58:34 lapio kernel: [9681928.590134] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
      Feb 10 13:58:34 lapio kernel: [9681928.590463] tar             D ffff880009a7b978     0 26564      1 0x00000006
      Feb 10 13:58:34 lapio kernel: [9681928.590469]  ffff880009a7b978 ffff880120d0a800 ffff88020444c740 ffff8802012f3900
      Feb 10 13:58:34 lapio kernel: [9681928.590473]  ffff880009a7c000 ffff880205f162c0 7fffffffffffffff ffffffff81867910
      Feb 10 13:58:34 lapio kernel: [9681928.590477]  ffff880009a7bab8 ffff880009a7b990 ffffffff81867065 0000000000000000
      Feb 10 13:58:34 lapio kernel: [9681928.590481] Call Trace:
      Feb 10 13:58:34 lapio kernel: [9681928.590494]  [<ffffffff81867910>] ? bit_wait+0x60/0x60
      Feb 10 13:58:34 lapio kernel: [9681928.590499]  [<ffffffff81867065>] schedule+0x35/0x80
      Feb 10 13:58:34 lapio kernel: [9681928.590503]  [<ffffffff8186a7d8>] schedule_timeout+0x208/0x280
      Feb 10 13:58:34 lapio kernel: [9681928.590528]  [<ffffffffc050ea49>] ? cifsFileInfo_put+0xa9/0x3f0 [cifs]
      Feb 10 13:58:34 lapio kernel: [9681928.590534]  [<ffffffff81023d95>] ? xen_clocksource_get_cycles+0x15/0x20
      Feb 10 13:58:34 lapio kernel: [9681928.590538]  [<ffffffff81867910>] ? bit_wait+0x60/0x60
      Feb 10 13:58:34 lapio kernel: [9681928.590542]  [<ffffffff818667b4>] io_schedule_timeout+0xa4/0x110
      Feb 10 13:58:34 lapio kernel: [9681928.590546]  [<ffffffff8186792b>] bit_wait_io+0x1b/0x70
      Feb 10 13:58:34 lapio kernel: [9681928.590550]  [<ffffffff818674bf>] __wait_on_bit+0x5f/0x90
      Feb 10 13:58:34 lapio kernel: [9681928.590555]  [<ffffffff8119857b>] wait_on_page_bit+0xcb/0xf0
      Feb 10 13:58:34 lapio kernel: [9681928.590561]  [<ffffffff810cad70>] ? autoremove_wake_function+0x40/0x40
      Feb 10 13:58:34 lapio kernel: [9681928.590565]  [<ffffffff81198693>] __filemap_fdatawait_range+0xf3/0x160
      Feb 10 13:58:34 lapio kernel: [9681928.590569]  [<ffffffff81198714>] filemap_fdatawait_range+0x14/0x30
      Feb 10 13:58:34 lapio kernel: [9681928.590573]  [<ffffffff8119a63a>] filemap_write_and_wait+0x6a/0x70
      Feb 10 13:58:34 lapio kernel: [9681928.590584]  [<ffffffffc0512b43>] cifs_flush+0x43/0x90 [cifs]
      Feb 10 13:58:34 lapio kernel: [9681928.590589]  [<ffffffff81219972>] filp_close+0x32/0x80
      Feb 10 13:58:34 lapio kernel: [9681928.590594]  [<ffffffff8123b3f5>] put_files_struct+0x75/0xd0
      Feb 10 13:58:34 lapio kernel: [9681928.590598]  [<ffffffff8123b4f7>] exit_files+0x47/0x50
      Feb 10 13:58:34 lapio kernel: [9681928.590603]  [<ffffffff810885ae>] do_exit+0x2ae/0xb90
      Feb 10 13:58:34 lapio kernel: [9681928.590607]  [<ffffffff811989fb>] ? __lock_page_killable+0xbb/0xe0
      Feb 10 13:58:34 lapio kernel: [9681928.590611]  [<ffffffff81088f17>] do_group_exit+0x47/0xb0
      Feb 10 13:58:34 lapio kernel: [9681928.590616]  [<ffffffff810957e1>] get_signal+0x171/0x950
      Feb 10 13:58:34 lapio kernel: [9681928.590621]  [<ffffffff8102e467>] do_signal+0x37/0x6f0
      Feb 10 13:58:34 lapio kernel: [9681928.590631]  [<ffffffffc0513577>] ? cifs_strict_readv+0xa7/0x100 [cifs]
      Feb 10 13:58:34 lapio kernel: [9681928.590636]  [<ffffffff810034fc>] exit_to_usermode_loop+0x8c/0xd0
      Feb 10 13:58:34 lapio kernel: [9681928.590640]  [<ffffffff81003c9a>] syscall_return_slowpath+0x5a/0x60
      Feb 10 13:58:34 lapio kernel: [9681928.590644]  [<ffffffff8186bd18>] int_ret_from_sys_call+0x25/0xa3
      Feb 10 13:58:34 lapio kernel: [9681928.590668]
      Feb 10 13:58:34 lapio kernel: [9681928.590679] Restarting tasks ... done.
      Feb 10 13:58:34 lapio kernel: [9681928.602642] xen:manage: do_suspend: freeze processes failed -16
      
      

      I see CIFS mentioned couple times in the VM kern log, could CIFS mounts on the VM be preventing the freeze task? I havent had this issue with this particular VM before though.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        The issue is in the VM indeed, it's not on XCP-ng nor XO. The VM doesn't cooperate because there's a problem inside it. Sadly, there's little we can do except to check if you don't use dynamic memory. Outside that, the OS is likely the issue.

        1 Reply Last reply Reply Quote 0
        • First post
          Last post