XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PCIe card removal and failure to boot from NVMe

    Scheduled Pinned Locked Moved Solved XCP-ng
    14 Posts 3 Posters 1.8k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      Are you sure it's cleared? I would suspect it's still the case. Note the Grub file is generated, so you really need to clear it from this command: /opt/xensource/libexec/xen-cmdline --delete-dom0 xen-pciback.hide

      E 1 Reply Last reply Reply Quote 0
      • E Offline
        exime @olivierlambert
        last edited by

        @olivierlambert - sorry, to be clear, I did run that command (multiple times)

        [11:10 xcp-ng ~]# /opt/xensource/libexec/xen-cmdline --get-dom0 xen-pciback.hide
        
        [11:12 xcp-ng ~]#
        
        
        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          And if you replug the card it works again, right?

          E 1 Reply Last reply Reply Quote 0
          • E Offline
            exime @olivierlambert
            last edited by

            @olivierlambert said in PCIe card removal and failure to boot from NVMe:

            And if you replug the card it works again, right?

            Yep, if I put it in the same slot.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              @dthenot or @dinhngtu any idea?

              E 1 Reply Last reply Reply Quote 0
              • E Offline
                exime @olivierlambert
                last edited by

                [11:12 xcp-ng ~]# xl pci-assignable-list
                0000:01:00.0
                0000:46:00.0
                0000:01:00.1
                0000:45:00.0
                0000:44:00.0
                
                

                44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared

                D 1 Reply Last reply Reply Quote 0
                • D Offline
                  dinhngtu Vates 🪐 XCP-ng Team @exime
                  last edited by

                  @exime Is the error coming from Grub or the Dom0 kernel?

                  E 1 Reply Last reply Reply Quote 0
                  • E Offline
                    exime @dinhngtu
                    last edited by

                    @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                    @exime Is the error coming from Grub or the Dom0 kernel?

                    I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).

                    D 1 Reply Last reply Reply Quote 0
                    • D Offline
                      dinhngtu Vates 🪐 XCP-ng Team @exime
                      last edited by

                      @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                      E 1 Reply Last reply Reply Quote 0
                      • E Offline
                        exime @dinhngtu
                        last edited by

                        @dinhngtu said in PCIe card removal and failure to boot from NVMe:

                        @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

                        [09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg
                        serial --unit=0 --speed=115200
                        terminal_input serial console
                        terminal_output serial console
                        set default=0
                        set timeout=5
                        if [ -s $prefix/grubenv ]; then
                                load_env
                        fi
                        
                        if [ -n "$override_entry" ]; then
                                set default=$override_entry
                        fi
                        
                        menuentry 'XCP-ng' {
                                search --label --set root root-hhtyle
                                multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance
                                module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles
                                module2 /boot/initrd-4.19-xen.img
                        }
                        
                        [11:44 xcp-ng ~]# ls -lR /dev/disk
                        /dev/disk:
                        total 0
                        drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id
                        drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label
                        drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid
                        drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path
                        drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid
                        
                        /dev/disk/by-id:
                        total 0
                        lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0
                        lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1
                        lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0
                        lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6
                        
                        /dev/disk/by-label:
                        total 0
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6
                        
                        /dev/disk/by-partuuid:
                        total 0
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1
                        
                        /dev/disk/by-path:
                        total 0
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6
                        lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1
                        
                        /dev/disk/by-uuid:
                        total 0
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1
                        lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6
                        lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3
                        lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
                        

                        I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.

                        D 1 Reply Last reply Reply Quote 0
                        • D Offline
                          dinhngtu Vates 🪐 XCP-ng Team @exime
                          last edited by

                          @exime Could you gather the outputs of /run/initramfs/rdsosreport.txt, ls /sys/module/nvme/drivers/pci\:nvme and other relevant commands from the recovery shell?

                          If you can't type or use a USB drive during the recovery shell, do dracut --add-drivers "usbhid uas" -f /boot/initrd-temp.img and boot from that initrd.

                          1 Reply Last reply Reply Quote 0
                          • E Offline
                            exime
                            last edited by

                            Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

                            TL;DR, all is well, thanks for the help!

                            1 Reply Last reply Reply Quote 1
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              Okay weird, at east glad to know it works now 🙂

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO olivierlambert marked this topic as a question on
                              • olivierlambertO olivierlambert has marked this topic as solved on

                              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                              With your input, this post could be even better 💗

                              Register Login
                              • First post
                                Last post