Live Migration in XO Fails

omatsei

I'm having an issue with XO (built from source) where live migrations are failing. In XO, when I select a test VM and the target host, it fails after just a second or two. However, it works perfectly if I do it from the command line while SSH'd into the pool master. The command I'm using is:

[22:06 xcp01 ~]# xe vm-migrate uuid=406bc5e7-e814-dc16-780e-adfc2635dfbe host-uuid=2133772d-f69e-4930-980e-583e81e0afb8
[22:07 xcp01 ~]#

The full details of the error are:

vm.migrate
{
  "vm": "406bc5e7-e814-dc16-780e-adfc2635dfbe",
  "migrationNetwork": "76cfdb59-4a35-9d50-6d86-99d68317d61c",
  "targetHost": "2133772d-f69e-4930-980e-583e81e0afb8"
}
{
  "code": "SR_BACKEND_FAILURE_202",
  "params": [
    "",
    "General backend error [opterr=rc: 21, stdout: , stderr: iscsiadm: No records found
]",
    ""
  ],
  "task": {
    "uuid": "f3e2ae4b-890b-4d1b-ee11-36d151482a0a",
    "name_label": "Async.VM.migrate_send",
    "name_description": "",
    "allowed_operations": [],
    "current_operations": {},
    "created": "20240528T02:01:51Z",
    "finished": "20240528T02:01:55Z",
    "status": "failure",
    "resident_on": "OpaqueRef:cbbc463f-6d3d-4693-b5fe-333944df6766",
    "progress": 1,
    "type": "<none/>",
    "result": "",
    "error_info": [
      "SR_BACKEND_FAILURE_202",
      "",
      "General backend error [opterr=rc: 21, stdout: , stderr: iscsiadm: No records found
]",
      ""
    ],
    "other_config": {},
    "subtask_of": "OpaqueRef:NULL",
    "subtasks": [],
    "backtrace": "(((process xapi)(filename ocaml/xapi/helpers.ml)(line 1690))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 134))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
  },
  "message": "SR_BACKEND_FAILURE_202(, General backend error [opterr=rc: 21, stdout: , stderr: iscsiadm: No records found
], )",
  "name": "XapiError",
  "stack": "XapiError: SR_BACKEND_FAILURE_202(, General backend error [opterr=rc: 21, stdout: , stderr: iscsiadm: No records found
], )
    at Function.wrap (file:///data/xo/xo-builds/xen-orchestra-202405272127/packages/xen-api/_XapiError.mjs:16:12)
    at default (file:///data/xo/xo-builds/xen-orchestra-202405272127/packages/xen-api/_getTaskResult.mjs:11:29)
    at Xapi._addRecordToCache (file:///data/xo/xo-builds/xen-orchestra-202405272127/packages/xen-api/index.mjs:1035:24)
    at file:///data/xo/xo-builds/xen-orchestra-202405272127/packages/xen-api/index.mjs:1069:14
    at Array.forEach (<anonymous>)
    at Xapi._processEvents (file:///data/xo/xo-builds/xen-orchestra-202405272127/packages/xen-api/index.mjs:1059:12)
    at Xapi._watchEvents (file:///data/xo/xo-builds/xen-orchestra-202405272127/packages/xen-api/index.mjs:1232:14)
    at runNextTicks (node:internal/process/task_queues:60:5)
    at processImmediate (node:internal/timers:447:9)
    at process.callbackTrampoline (node:internal/async_hooks:128:17)"
}

The VM is Ubuntu 22, does have XenTools installed, and has been rebooted recently (earlier this afternoon).

Any ideas?

olivierlambert

Hi,

I have some doubt the same command will work with xe and fails with XO while the error message is coming from XCP-ng directly (and its storage stack). Restart iscsiadm that should do the trick

omatsei

@olivierlambert Do you mean restart the iscsid service on the XCP host?

olivierlambert

Yes

omatsei

@olivierlambert Sorry, same error. I made sure there were no VM's on 2 different hosts, then restarted iscsid on both, then (via CLI) moved one VM back on. Then I tried migrating it from XO, and got the same error. I also made sure XO was updated to the latest stable release.

Random question, does XO need to be on the same subnet (or broadcast network) as the XCP hosts?

omatsei

@omatsei I found the following error on the source host, if it helps. I rebooted it and restarted iscsid on both the source and destination hosts, just to make sure nothing was pending or hung.

May 28 10:15:32 xcp09 xapi: [error||2507 ||backtrace] SR.scan D:9f4f3c05cc88 failed with exception Storage_error ([S(Redirect);[S(192.168.1.201)]])
May 28 10:15:32 xcp09 xapi: [error||2507 ||backtrace] Raised Storage_error ([S(Redirect);[S(192.168.1.201)]])
May 28 10:15:32 xcp09 xapi: [error||2507 ||backtrace] 1/1 xapi Raised at file (Thread 2507 has no backtrace table. Was with_backtraces called?, line 0
May 28 10:15:32 xcp09 xapi: [error||2507 ||backtrace]
May 28 10:15:32 xcp09 xapi: [error||2507 ||storage_interface] Storage_error ([S(Redirect);[S(192.168.1.201)]]) (File "storage/storage_interface.ml", line 436, characters 51-58)
May 28 10:15:32 xcp09 xapi: [error||2506 HTTP 127.0.0.1->:::80|Querying services D:6b15aa4c5bcd|storage_interface] Storage_error ([S(Redirect);[S(192.168.1.201)]]) (File "storage/storage_interface.ml", line 431, characters 49-56)
May 28 10:15:32 xcp09 xapi: [error||2506 HTTP 127.0.0.1->:::80|Querying services D:6b15aa4c5bcd|storage_interface] Storage_error ([S(Redirect);[S(192.168.1.201)]]) (File "storage/storage_interface.ml", line 436, characters 51-58)

Note that 192.168.1.201 is the pool master. I ended up rebooting the pool master after manually migrating VM's off it, and it seems to have fixed the issue. No idea why, but whatever.