XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. exime
    E
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 2
    • Posts 12
    • Groups 0

    exime

    @exime

    4
    Reputation
    2
    Profile views
    12
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    exime Unfollow Follow

    Best posts made by exime

    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert said in Google Coral TPU PCIe Passthrough Woes:

      Can you try an on older kernel in your VM just to be sure? (Eg Debian 10 guest with default bundled kernel)

      I'm just now getting back to this.

      Might the problem be related to this issue?

      "Unfortunately the device in question violates PCI specification by mapping PBA, MSI-X vector table, and other registers into same 4KB page (PBA is at 0x46068, VT at 0x46800, but there is a bunch of other registers in 0x46XXX range)."

      https://github.com/google-coral/edgetpu/issues/343#issuecomment-1287251821

      dakota created this issue in google-coral/edgetpu

      open Apex failing with error -110 (No /dev/apex_0) #343

      posted in Compute
      E
      exime
    • RE: PCIe card removal and failure to boot from NVMe

      Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

      TL;DR, all is well, thanks for the help!

      posted in XCP-ng
      E
      exime
    • PCIe card removal and failure to boot from NVMe

      I am attempting to remove a PCIe card that was formerly hidden/passed through (it 4 USB controller card) causes xcp-ng to fail to boot from my M.2 boot drive (can't find the root drive).

      Prior to removing the card, all 4 controllers were hidden from dom0. I removed the pass through from the two VMs that used a controller each, but I forgot to clear the dom0 pciback.hide parameter before. I put the card back in, booted up, cleared dom0 pciback.hide, and rebooted, thinking that would solve the problem - but I still get the same error to boot.

      I have confirmed that neither of the two VMs that had PCIe controllers passed through no longer have them set in the other-config (MRW) parameters. I've also confirmed that the controllers are not in /boot/efi/EFI/xenserver/grub.cfg.

      But, strangely, 3 of the 4 USB controllers that were hidden do still show up with xl pci-assignable-list - and I still can't boot without the card.

      What am I missing?

      posted in XCP-ng
      E
      exime

    Latest posts made by exime

    • RE: PCIe card removal and failure to boot from NVMe

      Update: I put the server back to the exact configuration it was in, rebooted, and nothing showed up in xl pci-assignable-list. There must have been some configuration... somewhere... that something was looking for that was related to the exact locations of those 4 USB controllers.

      TL;DR, all is well, thanks for the help!

      posted in XCP-ng
      E
      exime
    • RE: PCIe card removal and failure to boot from NVMe

      @dinhngtu said in PCIe card removal and failure to boot from NVMe:

      @exime What does your grub.cfg and the output of ls -lR /dev/disk look like? At the Grub menu, try replacing quiet vga=785 splash plymouth.ignore-serial-consoles with vga=785 loglevel=8 and see if you can gather some logs.

      [09:15 xcp-ng ~]# cat /boot/efi/EFI/xenserver/grub.cfg
      serial --unit=0 --speed=115200
      terminal_input serial console
      terminal_output serial console
      set default=0
      set timeout=5
      if [ -s $prefix/grubenv ]; then
              load_env
      fi
      
      if [ -n "$override_entry" ]; then
              set default=$override_entry
      fi
      
      menuentry 'XCP-ng' {
              search --label --set root root-hhtyle
              multiboot2 /boot/xen.gz dom0_mem=7552M,max:7552M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 cpufreq=xen:performance
              module2 /boot/vmlinuz-4.19-xen root=LABEL=root-hhtyle ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles
              module2 /boot/initrd-4.19-xen.img
      }
      
      [11:44 xcp-ng ~]# ls -lR /dev/disk
      /dev/disk:
      total 0
      drwxr-xr-x 2 root root 440 Jan 24 00:45 by-id
      drwxr-xr-x 2 root root 120 Jan 24 00:44 by-label
      drwxr-xr-x 2 root root 140 Jan 24 00:44 by-partuuid
      drwxr-xr-x 2 root root 200 Jan 24 00:44 by-path
      drwxr-xr-x 2 root root 160 Jan 24 00:45 by-uuid
      
      /dev/disk/by-id:
      total 0
      lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--4fadad90--5cd2--28d9--a74b--c2f8b2e99c85-4fadad90--5cd2--28d9--a74b--c2f8b2e99c85 -> ../../dm-0
      lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-name-XSLocalEXT--beb5b05b--fcf0--f72c--013f--ee64769c667d-beb5b05b--fcf0--f72c--013f--ee64769c667d -> ../../dm-1
      lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-2nTv98148P9MlLTa8b2mLCJbBm5b2EZw6e9Lwkqe7AKIjiO1KqCCXV1U5KAjc629 -> ../../dm-0
      lrwxrwxrwx 1 root root 10 Jan 24 00:45 dm-uuid-LVM-xCsOTEj1VHU01BfH36dnjDKKYodoLoH7J3vmvtKqml4yLHCs9mZ2UF1msemJRUIQ -> ../../dm-1
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.0000000000000000000cca0c02c43300 -> ../../nvme2n1
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b476d9fae -> ../../nvme0n1
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af -> ../../nvme1n1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part1 -> ../../nvme1n1p1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part2 -> ../../nvme1n1p2
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part3 -> ../../nvme1n1p3
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part5 -> ../../nvme1n1p5
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-eui.e8238fa6bf530001001b448b4ceb47af-part6 -> ../../nvme1n1p6
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-UCSC-NVMEHW-H1600_SDM0000766B3 -> ../../nvme2n1
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_BLACK_SN850X_4000GB_24035E801050 -> ../../nvme0n1
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055 -> ../../nvme1n1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part1 -> ../../nvme1n1p1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part2 -> ../../nvme1n1p2
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part3 -> ../../nvme1n1p3
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part5 -> ../../nvme1n1p5
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 nvme-WD_Blue_SN580_500GB_23433Q804055-part6 -> ../../nvme1n1p6
      
      /dev/disk/by-label:
      total 0
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 BOOT-HHTYLE -> ../../nvme1n1p3
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 logs-hhtyle -> ../../nvme1n1p5
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 root-hhtyle -> ../../nvme1n1p1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 swap-hhtyle -> ../../nvme1n1p6
      
      /dev/disk/by-partuuid:
      total 0
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 74a7e458-0357-4734-9650-dea92da16ba3 -> ../../nvme1n1p5
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 800fd05b-7f1d-45a5-9ad9-28b1151bceb8 -> ../../nvme1n1p3
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 9e98a731-2776-4550-b648-25fb65ce15f8 -> ../../nvme1n1p6
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 ab62b83f-b5dd-48f1-8a02-e2cec53cf18d -> ../../nvme1n1p2
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 b20e412b-aa17-4c49-b691-3217120fb1fc -> ../../nvme1n1p1
      
      /dev/disk/by-path:
      total 0
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:48:00.0-nvme-1 -> ../../nvme0n1
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:49:00.0-nvme-1 -> ../../nvme1n1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part1 -> ../../nvme1n1p1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part2 -> ../../nvme1n1p2
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part3 -> ../../nvme1n1p3
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part5 -> ../../nvme1n1p5
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 pci-0000:49:00.0-nvme-1-part6 -> ../../nvme1n1p6
      lrwxrwxrwx 1 root root 13 Jan 24 00:44 pci-0000:4e:00.0-nvme-1 -> ../../nvme2n1
      
      /dev/disk/by-uuid:
      total 0
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fa2653d-923c-4977-bbae-55b7d1f18dcd -> ../../nvme1n1p5
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 7fdb336a-d086-4060-a369-34fd1bba013e -> ../../nvme1n1p1
      lrwxrwxrwx 1 root root 10 Jan 24 00:45 8f140914-ede9-48b5-8196-a8b382e058b8 -> ../../dm-1
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 a90dd274-2c82-4efc-baa8-dac1f75a4206 -> ../../nvme1n1p6
      lrwxrwxrwx 1 root root 15 Jan 24 00:44 BACD-FCE2 -> ../../nvme1n1p3
      lrwxrwxrwx 1 root root 10 Jan 24 00:45 e5ad55df-b9d3-45c0-bd61-0b0c1ed2ad55 -> ../../dm-0
      

      I'll have to try the grub change to get logs when I can take the machine out of service, maybe later tonight.

      posted in XCP-ng
      E
      exime
    • RE: PCIe card removal and failure to boot from NVMe

      @dinhngtu said in PCIe card removal and failure to boot from NVMe:

      @exime Is the error coming from Grub or the Dom0 kernel?

      I'm guessing the kernel? I get the XCP-NG splash screen for a long time (normal boot is very quick on this machine), the progress bar goes about 50%, then it drops into an emergency shell of some sort saying it can't find the root drive (I'm guessing because the PCIe stuff is shifted unexpectedly).

      posted in XCP-ng
      E
      exime
    • RE: PCIe card removal and failure to boot from NVMe
      [11:12 xcp-ng ~]# xl pci-assignable-list
      0000:01:00.0
      0000:46:00.0
      0000:01:00.1
      0000:45:00.0
      0000:44:00.0
      
      

      44, 45, 46, 47 are the USB controllers. For some reason, only 47 is missing from this list. All four of them were previously hidden from dom0, but only 47 actually cleared

      posted in XCP-ng
      E
      exime
    • RE: PCIe card removal and failure to boot from NVMe

      @olivierlambert said in PCIe card removal and failure to boot from NVMe:

      And if you replug the card it works again, right?

      Yep, if I put it in the same slot.

      posted in XCP-ng
      E
      exime
    • RE: PCIe card removal and failure to boot from NVMe

      @olivierlambert - sorry, to be clear, I did run that command (multiple times)

      [11:10 xcp-ng ~]# /opt/xensource/libexec/xen-cmdline --get-dom0 xen-pciback.hide
      
      [11:12 xcp-ng ~]#
      
      
      posted in XCP-ng
      E
      exime
    • PCIe card removal and failure to boot from NVMe

      I am attempting to remove a PCIe card that was formerly hidden/passed through (it 4 USB controller card) causes xcp-ng to fail to boot from my M.2 boot drive (can't find the root drive).

      Prior to removing the card, all 4 controllers were hidden from dom0. I removed the pass through from the two VMs that used a controller each, but I forgot to clear the dom0 pciback.hide parameter before. I put the card back in, booted up, cleared dom0 pciback.hide, and rebooted, thinking that would solve the problem - but I still get the same error to boot.

      I have confirmed that neither of the two VMs that had PCIe controllers passed through no longer have them set in the other-config (MRW) parameters. I've also confirmed that the controllers are not in /boot/efi/EFI/xenserver/grub.cfg.

      But, strangely, 3 of the 4 USB controllers that were hidden do still show up with xl pci-assignable-list - and I still can't boot without the card.

      What am I missing?

      posted in XCP-ng
      E
      exime
    • RE: Google Coral TPU PCIe Passthrough Woes

      @andSmv ack - I'll wait and see if it works out for @jjgg since my Xen server is in active use

      posted in Compute
      E
      exime
    • RE: Google Coral TPU PCIe Passthrough Woes

      @andSmv thanks!

      @jjgg glad you're providing the info, sorry for abandoning the thread

      posted in Compute
      E
      exime
    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert said in Google Coral TPU PCIe Passthrough Woes:

      Can you try an on older kernel in your VM just to be sure? (Eg Debian 10 guest with default bundled kernel)

      I'm just now getting back to this.

      Might the problem be related to this issue?

      "Unfortunately the device in question violates PCI specification by mapping PBA, MSI-X vector table, and other registers into same 4KB page (PBA is at 0x46068, VT at 0x46800, but there is a bunch of other registers in 0x46XXX range)."

      https://github.com/google-coral/edgetpu/issues/343#issuecomment-1287251821

      dakota created this issue in google-coral/edgetpu

      open Apex failing with error -110 (No /dev/apex_0) #343

      posted in Compute
      E
      exime