XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    TrueNAS VM failing to start

    Scheduled Pinned Locked Moved Compute
    7 Posts 2 Posters 31 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E Offline
      EddieA
      last edited by

      Hi,

      I had to shut down my XCP-ng system to add a replacement for a previously failed NVMe that was attached to my TrueNAS SCALE VM, which also involved moving around a couple of the PCIe cards. I thought this would be a good opportunity to catch up with the upgrades, so also applied those during the shutdown/reboot.

      Following the upgrade and installing the replacement NVMe I can no longer boot my TrueNAS SCALE VM, it fails with:

      vm.start
      {
        "id": "81e6cde8-baba-5f2e-0a08-a4d9f3e0a41e",
        "bypassMacAddressesCheck": false,
        "force": false
      }
      {
        "code": "INTERNAL_ERROR",
        "params": [
          "xenopsd internal error: Cannot_add(0000:af:00.0, Device_common.QMP_Error(2, \"{\\\"error\\\":{\\\"class\\\":\\\"GenericError\\\",\\\"desc\\\":\\\"Failed to initialize 11/15, type = 0x1, rc: -1\\\",\\\"data\\\":{}},\\\"id\\\":\\\"qmp-000012-2\\\"}\"))"
        ],
        "call": {
          "duration": 8798,
          "method": "VM.start",
          "params": [
            "* session id *",
            "OpaqueRef:63502630-5729-b5d4-4ef2-49d6c14e07bd",
            false,
            false
          ]
        },
        "message": "INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:af:00.0, Device_common.QMP_Error(2, \"{\\\"error\\\":{\\\"class\\\":\\\"GenericError\\\",\\\"desc\\\":\\\"Failed to initialize 11/15, type = 0x1, rc: -1\\\",\\\"data\\\":{}},\\\"id\\\":\\\"qmp-000012-2\\\"}\")))",
        "name": "XapiError",
        "stack": "XapiError: INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:af:00.0, Device_common.QMP_Error(2, \"{\\\"error\\\":{\\\"class\\\":\\\"GenericError\\\",\\\"desc\\\":\\\"Failed to initialize 11/15, type = 0x1, rc: -1\\\",\\\"data\\\":{}},\\\"id\\\":\\\"qmp-000012-2\\\"}\")))
          at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)
          at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:38:21
          at runNextTicks (node:internal/process/task_queues:60:5)
          at processImmediate (node:internal/timers:454:9)
          at process.callbackTrampoline (node:internal/async_hooks:130:17)"
      }
      

      As far as I can see, all the passthrough devices are correctly specified.

      Is the device referenced in the error this:

      af:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01)
      

      Could this be caused by my switching a couple of the PCIe cards around, as one of them is an NVMe expansion that is passed through to TrueNAS (but this does show up correctly under it's own ID).

      This is XCP-ng 8.3 running on a Supermicro X11DPH-T.

      Cheers.

      E 1 Reply Last reply Reply Quote 0
      • E Offline
        EddieA @EddieA
        last edited by

        Thinking this could be down to the PCIe card moves, as that did change the IDs for some of the passthrough devices, I removed all the passthroughs, via the command line, and then reinstated them.

        Now when I try to start TrueNAS the whole system locks up. I can't enter anything via Putty, XOA, or the Supermicro ipmi.

        I have no idea where to go to from here.

        Cheers.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Are you sure your OS (TrueNAS) isn't waiting for the PCI device that's not passed through anymore?

          E 1 Reply Last reply Reply Quote 0
          • E Offline
            EddieA @olivierlambert
            last edited by

            @olivierlambert The same devices are passed through, just as different IDs.

            But that shouldn't "kill" XCP, so that XOA, Putty, etc no longer respond.

            Cheers.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              No it shouldn't. Have you removed the passthrough from the VM too? Without logs it's hard to tell, take a look inside if you can spot something

              E 1 Reply Last reply Reply Quote 0
              • E Offline
                EddieA @olivierlambert
                last edited by

                @olivierlambert
                Yes, I removed the passthoughs from the VM before I removed them at the DOM level.

                At the moment, the system is booted directly into TrueNAS and is re-silvering the replaced NVMe. Once this finishes, I can reboot XCP and take a look. Is there any particular log you think will give the most clues.

                Cheers.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  I think the usual stuff: https://docs.xcp-ng.org/troubleshooting/log-files/

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post