VDI is not available

TheLittleDuke

JUST ran into this today -- was able to FINALLY get the dead node back online (RAM module had gone off and re-seated it) -- but OMG this is terrible situation -- we were able to get vms to an "offline" status but were wholly unable to get them to start on any other node!

Surely there must be some way from the command line to unstick this?

Danp

@TheLittleDuke Please show us the full error message that you are encountering if you want help with this.

TheLittleDuke

Here you go:

vm.start { "id": "838dfee6-4b1c-0d12-7291-c128e663cb62", "bypassMacAddressesCheck": false, "force": false } { "code": "SR_BACKEND_FAILURE_46", "params": [ "", "The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72']]", "" ], "call": { "duration": 5670, "method": "VM.start", "params": [ "* session id *", "OpaqueRef:2a8b259d-5a3b-44cc-9e09-c938b1fb4b33", false, false ] }, "message": "SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72']], )", "name": "XapiError", "stack": "XapiError: SR_BACKEND_FAILURE_46(, The VDI is not available [opterr=['HOST_OFFLINE', 'OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72']], ) at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12) at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:38:21 at runNextTicks (node:internal/process/task_queues:60:5) at processImmediate (node:internal/timers:454:9) at process.callbackTrampoline (node:internal/async_hooks:130:17)" }

TheLittleDuke

FYI I was unable to disconnect the SR for the offline host as well:

pbd.disconnect
{
  "id": "9355e1d9-164c-76a8-6ab7-aba7a9793b31"
}
{
  "code": "HOST_OFFLINE",
  "params": [
    "OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72"
  ],
  "task": {
    "uuid": "3cb1c5e9-e7cb-83b5-e448-ccddc3c4fddf",
    "name_label": "Async.PBD.unplug",
    "name_description": "",
    "allowed_operations": [],
    "current_operations": {},
    "created": "20250919T14:58:59Z",
    "finished": "20250919T14:58:59Z",
    "status": "failure",
    "resident_on": "OpaqueRef:32345a51-c14d-4b3c-ae6e-e411c3c8808c",
    "progress": 1,
    "type": "<none/>",
    "result": "",
    "error_info": [
      "HOST_OFFLINE",
      "OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72"
    ],
    "other_config": {},
    "subtask_of": "OpaqueRef:NULL",
    "subtasks": [],
    "backtrace": "(((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 124))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 160))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
  },
  "message": "HOST_OFFLINE(OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72)",
  "name": "XapiError",
  "stack": "XapiError: HOST_OFFLINE(OpaqueRef:769bbac2-3fc3-4761-91d1-51fa8468fc72)
    at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)
    at default (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_getTaskResult.mjs:13:29)
    at Xapi._addRecordToCache (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1073:24)
    at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1107:14
    at Array.forEach (<anonymous>)
    at Xapi._processEvents (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1097:12)
    at Xapi._watchEvents (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/index.mjs:1270:14)"
}

Danp

@TheLittleDuke You mentioned that the host is now back online. Is the issue resolved or are you still having troubles with starting VMs?

Did you try using the Start on... option on the VM's Advanced tab?
Was the offline host the pool master?
Are the VM's VDIs on shared or local storage?

TheLittleDuke

@Danp the issue is "resolved" only because we got the host back online so it released the lock.

Yes I did try Start on which is method that produced the "The VDI is not available" error
no the offline host was not the pool master
Yes the VDI's are on shared iSCSI storage (truenas)

And I want to note that OTHER VM's that were not on that failed host were able to restart and on any other node without an issue.

I was even able to delete old snapshots without an issue so clearly the storage was online and available.

It is deeply concerning that we were unable to get the VMs running again and only coincidental that were were able to re-seat the ram module and get it back online -- once it rejoined the pool the locked VM started without any issues on another host, so something was clearly locking it down even though it was fully stopped and visible.