XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PCIe card removal and failure to boot from NVMe

    Scheduled Pinned Locked Moved Solved XCP-ng
    14 Posts 3 Posters 1.8k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      And if you replug the card it works again, right?

      E 1 Reply Last reply Reply Quote 0
      • E Offline
        exime @olivierlambert
        last edited by

        @olivierlambert said in PCIe card removal and failure to boot from NVMe:

        And if you replug the card it works again, right?

        Yep, if I put it in the same slot.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          @dthenot or @dinhngtu any idea?

          E 1 Reply Last reply Reply Quote 0
          • E Offline
            exime @olivierlambert
            last edited by

            [11:12 xcp-ng ~]# xl pci-assignable-list
            0000:01:00.0
            0000:46:00.0
            0000:01:00.1
            0000:45:00.0
            0000:44:00.0
            
            

            44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared

            D 1 Reply Last reply Reply Quote 0
            • D Offline
              dinhngtu Vates 🪐 XCP-ng Team @exime
              last edited by

              @exime Is the error coming from Grub or the Dom0 kernel?

              E 1 Reply Last reply Reply Quote 0
              • E Offline
                exime @dinhngtu
                last edited by

                @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                @exime Is the error coming from Grub or the Dom0 kernel?

                I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  dinhngtu Vates 🪐 XCP-ng Team @exime
                  last edited by

                  @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                  E 1 Reply Last reply Reply Quote 0
                  • E Offline
                    exime @dinhngtu
                    last edited by

                    @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                    @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                    [09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg
                    serial --unit=0 --speed=115200
                    terminal_input serial console
                    terminal_output serial console
                    set default=0
                    set timeout=5
                    if [ -s $prefix/grubenv ]; then
                            load_env
                    fi
                    
                    if [ -n "$override_entry" ]; then
                            set default=$override_entry
                    fi
                    
                    menuentry 'XCP-ng' {
                            search --label --set root root-hhtyle
                            multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance
                            module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles
                            module2 /boot/initrd-4.19-xen.img
                    }
                    
                    [11:44 xcp-ng ~]# ls -lR /dev/disk
                    /dev/disk:
                    total 0
                    drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id
                    drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label
                    drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid
                    drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path
                    drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid
                    
                    /dev/disk/by-id:
                    total 0
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6
                    
                    /dev/disk/by-label:
                    total 0
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6
                    
                    /dev/disk/by-partuuid:
                    total 0
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1
                    
                    /dev/disk/by-path:
                    total 0
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1
                    
                    /dev/disk/by-uuid:
                    total 0
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
                    

                    I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.

                    D 1 Reply Last reply Reply Quote 0
                    • D Offline
                      dinhngtu Vates 🪐 XCP-ng Team @exime
                      last edited by

                      @exime Could you gather the outputs of /run/initramfs/rdsosreport.txt, ls /sys/module/nvme/drivers/pci\:nvme and other relevant commands from the recovery shell?

                      If you can't type or use a USB drive during the recovery shell, do dracut --add-drivers "usbhid uas" -f /boot/initrd-temp.img and boot from that initrd.

                      1 Reply Last reply Reply Quote 0
                      • E Offline
                        exime
                        last edited by

                        Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

                        TL;DR, all is well, thanks for the help!

                        1 Reply Last reply Reply Quote 1
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          Okay weird, at east glad to know it works now 🙂

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO olivierlambert marked this topic as a question on
                          • olivierlambertO olivierlambert has marked this topic as solved on

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post