XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PCIe card removal and failure to boot from NVMe

    Scheduled Pinned Locked Moved Solved XCP-ng
    14 Posts 3 Posters 1.8k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E Offline
      exime @olivierlambert
      last edited by

      @olivierlambert said in PCIe card removal and failure to boot from NVMe:

      And if you replug the card it works again, right?

      Yep, if I put it in the same slot.

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        @dthenot or @dinhngtu any idea?

        E 1 Reply Last reply Reply Quote 0
        • E Offline
          exime @olivierlambert
          last edited by

          [11:12 xcp-ng ~]# xl pci-assignable-list
          0000:01:00.0
          0000:46:00.0
          0000:01:00.1
          0000:45:00.0
          0000:44:00.0
          
          

          44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            dinhngtu Vates 🪐 XCP-ng Team @exime
            last edited by

            @exime Is the error coming from Grub or the Dom0 kernel?

            E 1 Reply Last reply Reply Quote 0
            • E Offline
              exime @dinhngtu
              last edited by

              @dinhngtu said in PCIe card removal and failure to boot from NVMe:

              @exime Is the error coming from Grub or the Dom0 kernel?

              I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).

              D 1 Reply Last reply Reply Quote 0
              • D Offline
                dinhngtu Vates 🪐 XCP-ng Team @exime
                last edited by

                @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                E 1 Reply Last reply Reply Quote 0
                • E Offline
                  exime @dinhngtu
                  last edited by

                  @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                  @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                  [09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg
                  serial --unit=0 --speed=115200
                  terminal_input serial console
                  terminal_output serial console
                  set default=0
                  set timeout=5
                  if [ -s $prefix/grubenv ]; then
                          load_env
                  fi
                  
                  if [ -n "$override_entry" ]; then
                          set default=$override_entry
                  fi
                  
                  menuentry 'XCP-ng' {
                          search --label --set root root-hhtyle
                          multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance
                          module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles
                          module2 /boot/initrd-4.19-xen.img
                  }
                  
                  [11:44 xcp-ng ~]# ls -lR /dev/disk
                  /dev/disk:
                  total 0
                  drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id
                  drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label
                  drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid
                  drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path
                  drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid
                  
                  /dev/disk/by-id:
                  total 0
                  lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0
                  lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1
                  lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0
                  lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6
                  
                  /dev/disk/by-label:
                  total 0
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6
                  
                  /dev/disk/by-partuuid:
                  total 0
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1
                  
                  /dev/disk/by-path:
                  total 0
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6
                  lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1
                  
                  /dev/disk/by-uuid:
                  total 0
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1
                  lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6
                  lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3
                  lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
                  

                  I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.

                  D 1 Reply Last reply Reply Quote 0
                  • D Offline
                    dinhngtu Vates 🪐 XCP-ng Team @exime
                    last edited by

                    @exime Could you gather the outputs of /run/initramfs/rdsosreport.txt, ls /sys/module/nvme/drivers/pci\:nvme and other relevant commands from the recovery shell?

                    If you can't type or use a USB drive during the recovery shell, do dracut --add-drivers "usbhid uas" -f /boot/initrd-temp.img and boot from that initrd.

                    1 Reply Last reply Reply Quote 0
                    • E Offline
                      exime
                      last edited by

                      Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

                      TL;DR, all is well, thanks for the help!

                      1 Reply Last reply Reply Quote 1
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        Okay weird, at east glad to know it works now 🙂

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO olivierlambert marked this topic as a question on
                        • olivierlambertO olivierlambert has marked this topic as solved on

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post