XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    TrueNAS VM failing to start

    Scheduled Pinned Locked Moved Compute
    22 Posts 5 Posters 2.2k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E Offline
      EddieA @olivierlambert
      last edited by

      @olivierlambert Any further thoughts or suggestions (move PCIe cards around again ??).

      Cheers.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        No but maybe @Team-Hypervisor-Kernel does

        1 Reply Last reply Reply Quote 0
        • Y Offline
          yannsionneau Vates 🪐 XCP-ng Team
          last edited by

          Hello @eddiea

          I've sent you a link in private so that you can upload all your log files.

          Thanks

          Regards,

          Yann

          E 1 Reply Last reply Reply Quote 0
          • E Offline
            EddieA @yannsionneau
            last edited by

            @yannsionneau Uploaded contents of /var/crash together with the output of "xen-bugtool --yestoall".

            Cheers.

            TeddyAstieT 1 Reply Last reply Reply Quote 1
            • TeddyAstieT Offline
              TeddyAstie Vates 🪐 XCP-ng Team Xen Guru @EddieA
              last edited by TeddyAstie

              @EddieA Can you try differents combinations of passedthrough hardware in this VM ?

              e.g try with each device one by one at a time; at least in the VM

              1 Reply Last reply Reply Quote 0
              • E Offline
                EddieA @EddieA
                last edited by

                Give me a couple of days to try. It is (obviously) down to the combination of devices passed through, as I reported this earlier:

                said in TrueNAS VM failing to start:

                Re-boot XCP and start the TrueNAS VM with NO passthrough devices. As expected, that started up fine.

                Cheers.

                E 1 Reply Last reply Reply Quote 0
                • E Offline
                  EddieA @EddieA
                  last edited by

                  OK, not really sure what's going on. I fired XCP back up to try what @teddyastie suggested.

                  Looking at the specs for the TrueNAS VM before booting it, it now had zero passthrough devices attached, which wasn't the state of the last time I tried (from memory). So re-added all but 1 passthrough, a GPU. Booted TrueNAS and this time it came up.

                  Bingo, I thought, the GPU is the issue, but based on my background, I had to try again with the GPU included to prove it was the culprit. Well, what do you know, after adding it back in, TrueNAS now starts perfectly. One theory destroyed.

                  All I can think, is that somehow the passthrough definitions in the VM config were corrupted and finding them all gone and re-adding them fixed this. Who knows.

                  But all appears to be good again (for now).

                  Y 1 Reply Last reply Reply Quote 1
                  • Y Offline
                    yannsionneau Vates 🪐 XCP-ng Team @EddieA
                    last edited by

                    @EddieA Ok, good that your setup is now fully operational !

                    Let's sort this as a self-resolved problem then for now.
                    Don't hesitate to ping us again if the issue comes back.

                    A 1 Reply Last reply Reply Quote 0
                    • A Offline
                      AshleyDe @yannsionneau
                      last edited by

                      That is a frustrating loop to be in, especially with TrueNAS. Usually, when the VM fails to start after a change, it’s because XCP-ng is trying to pass through a PCI device (like an HBA) that isn't being released properly by the host.
                      Have you checked if the "hide" parameters in your grub config are still correct? Sometimes an update can reset those, and the host grabs the controller before the VM can. Another thing to try is toggling the BIOS/UEFI mode in the VM settings - TrueNAS can be picky about that depending on which version you’re running.

                      E 1 Reply Last reply Reply Quote 0
                      • E Offline
                        EddieA @AshleyDe
                        last edited by

                        Wearing my best Lazarus cosplay outfit, I'll apologise for the resurrection.

                        Today I had an issue with my UPS which caused me to reboot XCP a few times. During those reboots I had at least 2, maybe 3, re-occurrences of this where when TrueNAS was booting, XCP would lock up. Most of the time, after a power cycle of the server, the next boot would start TRUENAS cleanly. One time it took 2 power cycles before success.

                        Unfortunately only one of the crashes resulted in a /var/crash report, but that did have the same symptoms as my original report:

                        (XEN) [   81.101362] Non-responding CPUs: {24-47}
                        (XEN) [   81.101363]
                        (XEN) [   81.101364] ****************************************
                        (XEN) [   81.101365] Panic on CPU 5:
                        (XEN) [   81.101366] FATAL TRAP: vec 2, NMI[0000] IN INTERRUPT CONTEXT
                        (XEN) [   81.101366] ****************************************
                        (XEN) [   81.101367]
                        (XEN) [   81.101368] Reboot in five seconds...
                        (XEN) [   81.101369] Executing kexec image on cpu5
                        (XEN) [   82.101441] Failed to shoot down CPUs {24-47}
                        

                        Between my original report and today, I have rebooted other times, following updates, when this issue has not surfaced.

                        Does anyone think this could be hardware related, despite all the memory testing and stress testing I did when I built the server and again after the original issue, all with no faults. Or have I just got an unlucky set of circumstances with some sort of race condition.

                        1 Reply Last reply Reply Quote 0

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post