XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PCIe card removal and failure to boot from NVMe

    Scheduled Pinned Locked Moved Solved XCP-ng
    14 Posts 3 Posters 1.8k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E Offline
      exime
      last edited by

      I am attempting to remove a PCIe card that was formerly hidden/passed through (it 4 USB controller card) causes xcp-ng to fail to boot from my M.2 boot drive (can't find the root drive).

      Prior to removing the card, all 4 controllers were hidden from dom0. I removed the pass through from the two VMs that used a controller each, but I forgot to clear the dom0 pciback.hide parameter before. I put the card back in, booted up, cleared dom0 pciback.hide, and rebooted, thinking that would solve the problem - but I still get the same error to boot.

      I have confirmed that neither of the two VMs that had PCIe controllers passed through no longer have them set in the other-config (MRW) parameters. I've also confirmed that the controllers are not in /boot/efi/EFI/xenserver/grub.cfg.

      But, strangely, 3 of the 4 USB controllers that were hidden do still show up with xl pci-assignable-list - and I still can't boot without the card.

      What am I missing?

      1 Reply Last reply Reply Quote 1
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Are you sure it's cleared? I would suspect it's still the case. Note the Grub file is generated, so you really need to clear it from this command: /opt/xensource/libexec/xen-cmdline --delete-dom0 xen-pciback.hide

        E 1 Reply Last reply Reply Quote 0
        • E Offline
          exime @olivierlambert
          last edited by

          @olivierlambert - sorry, to be clear, I did run that command (multiple times)

          [11:10 xcp-ng ~]# /opt/xensource/libexec/xen-cmdline --get-dom0 xen-pciback.hide
          
          [11:12 xcp-ng ~]#
          
          
          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            And if you replug the card it works again, right?

            E 1 Reply Last reply Reply Quote 0
            • E Offline
              exime @olivierlambert
              last edited by

              @olivierlambert said in PCIe card removal and failure to boot from NVMe:

              And if you replug the card it works again, right?

              Yep, if I put it in the same slot.

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                @dthenot or @dinhngtu any idea?

                E 1 Reply Last reply Reply Quote 0
                • E Offline
                  exime @olivierlambert
                  last edited by

                  [11:12 xcp-ng ~]# xl pci-assignable-list
                  0000:01:00.0
                  0000:46:00.0
                  0000:01:00.1
                  0000:45:00.0
                  0000:44:00.0
                  
                  

                  44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared

                  D 1 Reply Last reply Reply Quote 0
                  • D Offline
                    dinhngtu Vates 🪐 XCP-ng Team @exime
                    last edited by

                    @exime Is the error coming from Grub or the Dom0 kernel?

                    E 1 Reply Last reply Reply Quote 0
                    • E Offline
                      exime @dinhngtu
                      last edited by

                      @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                      @exime Is the error coming from Grub or the Dom0 kernel?

                      I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).

                      D 1 Reply Last reply Reply Quote 0
                      • D Offline
                        dinhngtu Vates 🪐 XCP-ng Team @exime
                        last edited by

                        @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                        E 1 Reply Last reply Reply Quote 0
                        • E Offline
                          exime @dinhngtu
                          last edited by

                          @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                          @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                          [09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg
                          serial --unit=0 --speed=115200
                          terminal_input serial console
                          terminal_output serial console
                          set default=0
                          set timeout=5
                          if [ -s $prefix/grubenv ]; then
                                  load_env
                          fi
                          
                          if [ -n "$override_entry" ]; then
                                  set default=$override_entry
                          fi
                          
                          menuentry 'XCP-ng' {
                                  search --label --set root root-hhtyle
                                  multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance
                                  module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles
                                  module2 /boot/initrd-4.19-xen.img
                          }
                          
                          [11:44 xcp-ng ~]# ls -lR /dev/disk
                          /dev/disk:
                          total 0
                          drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id
                          drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label
                          drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid
                          drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path
                          drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid
                          
                          /dev/disk/by-id:
                          total 0
                          lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0
                          lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1
                          lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0
                          lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6
                          
                          /dev/disk/by-label:
                          total 0
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6
                          
                          /dev/disk/by-partuuid:
                          total 0
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1
                          
                          /dev/disk/by-path:
                          total 0
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6
                          lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1
                          
                          /dev/disk/by-uuid:
                          total 0
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1
                          lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6
                          lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3
                          lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
                          

                          I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.

                          D 1 Reply Last reply Reply Quote 0
                          • D Offline
                            dinhngtu Vates 🪐 XCP-ng Team @exime
                            last edited by

                            @exime Could you gather the outputs of /run/initramfs/rdsosreport.txt, ls /sys/module/nvme/drivers/pci\:nvme and other relevant commands from the recovery shell?

                            If you can't type or use a USB drive during the recovery shell, do dracut --add-drivers "usbhid uas" -f /boot/initrd-temp.img and boot from that initrd.

                            1 Reply Last reply Reply Quote 0
                            • E Offline
                              exime
                              last edited by

                              Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

                              TL;DR, all is well, thanks for the help!

                              1 Reply Last reply Reply Quote 1
                              • olivierlambertO Offline
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by

                                Okay weird, at east glad to know it works now 🙂

                                1 Reply Last reply Reply Quote 0
                                • olivierlambertO olivierlambert marked this topic as a question on
                                • olivierlambertO olivierlambert has marked this topic as solved on

                                Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                With your input, this post could be even better 💗

                                Register Login
                                • First post
                                  Last post