XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PCIe card removal and failure to boot from NVMe

    Scheduled Pinned Locked Moved Solved XCP-ng
    14 Posts 3 Posters 256 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      And if you replug the card it works again, right?

      E 1 Reply Last reply Reply Quote 0
      • E Offline
        exime @olivierlambert
        last edited by

        @olivierlambert said in PCIe card removal and failure to boot from NVMe:

        And if you replug the card it works again, right?

        Yep, if I put it in the same slot.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          @dthenot or @dinhngtu any idea?

          E 1 Reply Last reply Reply Quote 0
          • E Offline
            exime @olivierlambert
            last edited by

            [11:12 xcp-ng ~]# xl pci-assignable-list
            0000:01:00.0
            0000:46:00.0
            0000:01:00.1
            0000:45:00.0
            0000:44:00.0
            
            

            44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared

            D 1 Reply Last reply Reply Quote 0
            • D Offline
              dinhngtu Vates 🪐 XCP-ng Team @exime
              last edited by

              @exime Is the error coming from Grub or the Dom0 kernel?

              E 1 Reply Last reply Reply Quote 0
              • E Offline
                exime @dinhngtu
                last edited by

                @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                @exime Is the error coming from Grub or the Dom0 kernel?

                I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  dinhngtu Vates 🪐 XCP-ng Team @exime
                  last edited by

                  @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                  E 1 Reply Last reply Reply Quote 0
                  • E Offline
                    exime @dinhngtu
                    last edited by

                    @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                    @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                    [09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg
                    serial --unit=0 --speed=115200
                    terminal_input serial console
                    terminal_output serial console
                    set default=0
                    set timeout=5
                    if [ -s $prefix/grubenv ]; then
                            load_env
                    fi
                    
                    if [ -n "$override_entry" ]; then
                            set default=$override_entry
                    fi
                    
                    menuentry 'XCP-ng' {
                            search --label --set root root-hhtyle
                            multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance
                            module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles
                            module2 /boot/initrd-4.19-xen.img
                    }
                    
                    [11:44 xcp-ng ~]# ls -lR /dev/disk
                    /dev/disk:
                    total 0
                    drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id
                    drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label
                    drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid
                    drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path
                    drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid
                    
                    /dev/disk/by-id:
                    total 0
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6
                    
                    /dev/disk/by-label:
                    total 0
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6
                    
                    /dev/disk/by-partuuid:
                    total 0
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1
                    
                    /dev/disk/by-path:
                    total 0
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1
                    
                    /dev/disk/by-uuid:
                    total 0
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6
                    lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3
                    lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
                    

                    I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.

                    D 1 Reply Last reply Reply Quote 0
                    • D Offline
                      dinhngtu Vates 🪐 XCP-ng Team @exime
                      last edited by

                      @exime Could you gather the outputs of /run/initramfs/rdsosreport.txt, ls /sys/module/nvme/drivers/pci\:nvme and other relevant commands from the recovery shell?

                      If you can't type or use a USB drive during the recovery shell, do dracut --add-drivers "usbhid uas" -f /boot/initrd-temp.img and boot from that initrd.

                      1 Reply Last reply Reply Quote 0
                      • E Offline
                        exime
                        last edited by

                        Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

                        TL;DR, all is well, thanks for the help!

                        1 Reply Last reply Reply Quote 1
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          Okay weird, at east glad to know it works now 🙂

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO olivierlambert marked this topic as a question on
                          • olivierlambertO olivierlambert has marked this topic as solved on
                          • First post
                            Last post