PCIe card removal and failure to boot from NVMe
-
I am attempting to remove a PCIe card that was formerly hidden/passed through (it 4 USB controller card) causes xcp-ng to fail to boot from my M.2 boot drive (can't find the root drive).
Prior to removing the card, all 4 controllers were hidden from dom0. I removed the pass through from the two VMs that used a controller each, but I forgot to clear the dom0
pciback.hide
parameter before. I put the card back in, booted up, cleared dom0pciback.hide
, and rebooted, thinking that would solve the problem - but I still get the same error to boot.I have confirmed that neither of the two VMs that had PCIe controllers passed through no longer have them set in the other-config (MRW) parameters. I've also confirmed that the controllers are not in
/boot/efi/EFI/xenserver/grub.cfg
.But, strangely, 3 of the 4 USB controllers that were hidden do still show up with
xl pci-assignable-list
- and I still can't boot without the card.What am I missing?
-
Are you sure it's cleared? I would suspect it's still the case. Note the Grub file is generated, so you really need to clear it from this command:
/opt/xensource/libexec/xen-cmdline --delete-dom0 xen-pciback.hide
-
@olivierlambert - sorry, to be clear, I did run that command (multiple times)
[11:10 xcp-ng ~]# /opt/xensource/libexec/xen-cmdline --get-dom0 xen-pciback.hide [11:12 xcp-ng ~]#
-
And if you replug the card it works again, right?
-
@olivierlambert said in PCIe card removal and failure to boot from NVMe:
And if you replug the card it works again, right?
Yep, if I put it in the same slot.
-
-
[11:12 xcp-ng ~]# xl pci-assignable-list 0000:01:00.0 0000:46:00.0 0000:01:00.1 0000:45:00.0 0000:44:00.0
44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared
-
@exime Is the error coming from Grub or the Dom0 kernel?
-
@dinhngtu said in PCIe card removal and failure to boot from NVMe:
@exime Is the error coming from Grub or the Dom0 kernel?
I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).
-
@exime What does your
grub.cfg
and the output ofls -lR /dev/disk
look like? At the Grub menu, try replacingquiet vga=785 splash plymouth.ignore-serial-consoles
withvga=785 loglevel=8
and see if you can gather some logs. -
@dinhngtu said in PCIe card removal and failure to boot from NVMe:
@exime What does your
grub.cfg
and the output ofls -lR /dev/disk
look like? At the Grub menu, try replacingquiet vga=785 splash plymouth.ignore-serial-consoles
withvga=785 loglevel=8
and see if you can gather some logs.[09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg serial --unit=0 --speed=115200 terminal_input serial console terminal_output serial console set default=0 set timeout=5 if [ -s $prefix/grubenv ]; then load_env fi if [ -n "$override_entry" ]; then set default=$override_entry fi menuentry 'XCP-ng' { search --label --set root root-hhtyle multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles module2 /boot/initrd-4.19-xen.img }
[11:44 xcp-ng ~]# ls -lR /dev/disk /dev/disk: total 0 drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid /dev/disk/by-id: total 0 lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0 lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1 lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0 lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1 lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1 lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1 lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6 lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1 lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1 lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5 lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6 /dev/disk/by-label: total 0 lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5 lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6 /dev/disk/by-partuuid: total 0 lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5 lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6 lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2 lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1 /dev/disk/by-path: total 0 lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1 lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2 lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5 lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6 lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1 /dev/disk/by-uuid: total 0 lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5 lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1 lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1 lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6 lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3 lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.
-
@exime Could you gather the outputs of
/run/initramfs/rdsosreport.txt
,ls /sys/module/nvme/drivers/pci\:nvme
and other relevant commands from the recovery shell?If you can't type or use a USB drive during the recovery shell, do
dracut --add-drivers "usbhid uas" -f /boot/initrd-temp.img
and boot from that initrd. -
Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in
xl pci-assignable-list
. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.TL;DR, all is well, thanks for the help!
-
Okay weird, at east glad to know it works now
-
-