TrueNAS VM failing to start
-
Hi,
I had to shut down my XCP-ng system to add a replacement for a previously failed NVMe that was attached to my TrueNAS SCALE VM, which also involved moving around a couple of the PCIe cards. I thought this would be a good opportunity to catch up with the upgrades, so also applied those during the shutdown/reboot.
Following the upgrade and installing the replacement NVMe I can no longer boot my TrueNAS SCALE VM, it fails with:
vm.start { "id": "81e6cde8-baba-5f2e-0a08-a4d9f3e0a41e", "bypassMacAddressesCheck": false, "force": false } { "code": "INTERNAL_ERROR", "params": [ "xenopsd internal error: Cannot_add(0000:af:00.0, Device_common.QMP_Error(2, \"{\\\"error\\\":{\\\"class\\\":\\\"GenericError\\\",\\\"desc\\\":\\\"Failed to initialize 11/15, type = 0x1, rc: -1\\\",\\\"data\\\":{}},\\\"id\\\":\\\"qmp-000012-2\\\"}\"))" ], "call": { "duration": 8798, "method": "VM.start", "params": [ "* session id *", "OpaqueRef:63502630-5729-b5d4-4ef2-49d6c14e07bd", false, false ] }, "message": "INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:af:00.0, Device_common.QMP_Error(2, \"{\\\"error\\\":{\\\"class\\\":\\\"GenericError\\\",\\\"desc\\\":\\\"Failed to initialize 11/15, type = 0x1, rc: -1\\\",\\\"data\\\":{}},\\\"id\\\":\\\"qmp-000012-2\\\"}\")))", "name": "XapiError", "stack": "XapiError: INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:af:00.0, Device_common.QMP_Error(2, \"{\\\"error\\\":{\\\"class\\\":\\\"GenericError\\\",\\\"desc\\\":\\\"Failed to initialize 11/15, type = 0x1, rc: -1\\\",\\\"data\\\":{}},\\\"id\\\":\\\"qmp-000012-2\\\"}\"))) at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12) at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:38:21 at runNextTicks (node:internal/process/task_queues:60:5) at processImmediate (node:internal/timers:454:9) at process.callbackTrampoline (node:internal/async_hooks:130:17)" }As far as I can see, all the passthrough devices are correctly specified.
Is the device referenced in the error this:
af:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01)Could this be caused by my switching a couple of the PCIe cards around, as one of them is an NVMe expansion that is passed through to TrueNAS (but this does show up correctly under it's own ID).
This is XCP-ng 8.3 running on a Supermicro X11DPH-T.
Cheers.
-
Thinking this could be down to the PCIe card moves, as that did change the IDs for some of the passthrough devices, I removed all the passthroughs, via the command line, and then reinstated them.
Now when I try to start TrueNAS the whole system locks up. I can't enter anything via Putty, XOA, or the Supermicro ipmi.
I have no idea where to go to from here.
Cheers.
-
Are you sure your OS (TrueNAS) isn't waiting for the PCI device that's not passed through anymore?
-
@olivierlambert The same devices are passed through, just as different IDs.
But that shouldn't "kill" XCP, so that XOA, Putty, etc no longer respond.
Cheers.
-
No it shouldn't. Have you removed the passthrough from the VM too? Without logs it's hard to tell, take a look inside if you can spot something
-
@olivierlambert
Yes, I removed the passthoughs from the VM before I removed them at the DOM level.At the moment, the system is booted directly into TrueNAS and is re-silvering the replaced NVMe. Once this finishes, I can reboot XCP and take a look. Is there any particular log you think will give the most clues.
Cheers.
-
I think the usual stuff: https://docs.xcp-ng.org/troubleshooting/log-files/