Failed to start VM

kevdog

Hi I'm running xcp-ng 8.2 and I've been doing so for a long time. Yesterday I received a notification about missing pool patches (with X0) and I installed chose the option to install the missing pool patches. In addition, I performed package upgrades of all the VMs (either apt for Ubuntu VMs or pacman for Arch VMs). One of my Arch VMs now fails to start and I'm receiving the error:

vm.start
{
  "id": "98a8dc5a-66f4-a475-ec2a-3de2df439629",
  "bypassMacAddressesCheck": false,
  "force": false
}
{
  "code": "FAILED_TO_START_EMULATOR",
  "params": [
    "OpaqueRef:d091d6c6-168a-4f6a-b365-c54cac642031",
    "domid 28",
    "QMP failure at File \"xc/device.ml\", line 3328, characters 71-78"
  ],
  "call": {
    "method": "VM.start",
    "params": [
      "OpaqueRef:d091d6c6-168a-4f6a-b365-c54cac642031",
      false,
      false
    ]
  },
  "message": "FAILED_TO_START_EMULATOR(OpaqueRef:d091d6c6-168a-4f6a-b365-c54cac642031, domid 28, QMP failure at File \"xc/device.ml\", line 3328, characters 71-78)",
  "name": "XapiError",
  "stack": "XapiError: FAILED_TO_START_EMULATOR(OpaqueRef:d091d6c6-168a-4f6a-b365-c54cac642031, domid 28, QMP failure at File \"xc/device.ml\", line 3328, characters 71-78)
    at Function.wrap (/opt/xen-orchestra/packages/xen-api/src/_XapiError.js:16:12)
    at /opt/xen-orchestra/packages/xen-api/src/transports/json-rpc.js:36:27
    at AsyncResource.runInAsyncScope (node:async_hooks:199:9)
    at cb (/opt/xen-orchestra/node_modules/bluebird/js/release/util.js:355:42)
    at tryCatcher (/opt/xen-orchestra/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:547:31)
    at Promise._settlePromise (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:604:18)
    at Promise._settlePromise0 (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:649:10)
    at Promise._settlePromises (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:729:18)
    at _drainQueueStep (/opt/xen-orchestra/node_modules/bluebird/js/release/async.js:93:12)
    at _drainQueue (/opt/xen-orchestra/node_modules/bluebird/js/release/async.js:86:9)
    at Async._drainQueues (/opt/xen-orchestra/node_modules/bluebird/js/release/async.js:102:5)
    at Immediate.Async.drainQueues [as _onImmediate] (/opt/xen-orchestra/node_modules/bluebird/js/release/async.js:15:14)
    at processImmediate (node:internal/timers:464:21)
    at process.topLevelDomainCallback (node:domain:152:15)
    at process.callbackTrampoline (node:internal/async_hooks:128:24)"
}

I actually went through the process of restoring a delta backup of this VM, and the restored delta backup has the exact same error, so I tending to think this is a problem with the hypervisor and not the actual VM. I've seen errors similar to this listed in other posts here on the forum, however it seems invariably there are posts saying -- yea --- this error message really needs to be more specific in exposing the underlying problem. I've tried ejecting v-CDR - xe vm-cd-eject --multiple however this command didn't exactly help. I'm booting via bios.

olivierlambert

Hi,

Please restart the toolstack on all your hosts, then the VM will boot. After each update, you shouldn't forget to restart the toolstack (at least) OR reboot the host.

kevdog

@olivierlambert
Hey thanks a lot -- that did the trick.

Just to be clear on my end -- when should a restart the toolstack?

olivierlambert

It's clearly explained in our official doc: https://xcp-ng.org/docs/updates.html#how-to-apply-the-updates

So I don't want to answer with a simple "RTFM", but well Sometimes it's really the answer

kevdog

@olivierlambert

I'm ok with RTFM.

olivierlambert

For your defense, if you used the Rolling pool update mechanism, you would have the same issue. We are fixing our algorithm to be sure we update and reboot hosts one by one to avoid this in the future So XO code even missed XCP-ng doc of restarting first (host or toolstack) before moving VMs around

Danp

@olivierlambert said in Failed to start VM:

RTFM

One of my favorite saying!

olivierlambert

I don't like to use it often, because it might also be "my fault" (bad UI, bad docs and so on). I always try to understand why you have a problem first, and how we can solve it

So it's not often a lack of reading the doc, but stuff you can improve on your side And this is the spirit with XCP-ng/XO: always listen users and try to improve everyday!

reflex

Hello everyone,

XCP-NG 8.2.1 here. Recently, I encountered issues with SR network connectivity (10Gb iSCSI), which caused a few virtual machines (VMs) to stop and subsequently fail to start, displaying the well-known error message: FAILED_TO_START_EMULATOR. While restarting the toolstack or the host are proven solutions, I would like to suggest two additional approaches:

If you have at least two hosts in the same pool, attempt to start the VM on another host.
Create a snapshot of the VM and immediately revert to it. Then try to start the VM again. This should resolve the issue.