XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    1. Home
    2. andyhhp
    3. Best
    A
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 6
    • Best 4
    • Controversial 0
    • Groups 1

    Best posts made by andyhhp

    • RE: PCI Nvidia GPU Passthrough boot delay

      @tomg That is the work, but it needs rebasing over the XSA-400 work, so a v4 series is going to be needed at a minimum.

      HAP is Xen's vendor-neutral name for Intel EPT or AMD NPT hardware support. We have had superpage support for many years here.

      IOMMU pagetables can either be shared with EPT/NPT (reduces the memory overhead of running the VM), or split (required for AMD due to hardware incompatibilities, and also required to support migration of a VM with an IO devices).

      When pagetables are shared, the HAP superpage support gives the IOMMU superpages too (because they're literally the same set of pagetables in memory). When pagetables are split, HAP gets superpages while the IOMMU logic currently uses small pages.

      posted in Compute
      A
      andyhhp
    • RE: PCI Nvidia GPU Passthrough boot delay

      @tomg said in PCI Nvidia GPU Passthrough enable permissive?:

      It appears to be consistently 20 - 30 seconds per RTX Ampere GPU, about 20 - 25 seconds on Quadros and ~90 seconds on an A100.
      What's worse on the A100, it seems the calls are made linear so say I pass through four A100s the wait time to boot will be 4x90s, not optimal.

      These are known, and yeah - they are not great. It's an issue in Xen where the IOMMU logic doesn't (yet) support superpage mappings, so time delay you're observing is the time taken to map, unmap, and remap the GPU's massive BAR using 4k pages. (It's Qemu taking action in response to the actions of the guest.)

      The good news is that IOMMU superpage support is in progress upstream, and should turn this delay into milliseconds.

      posted in Compute
      A
      andyhhp
    • RE: Passed Through GPU Crashes Host During Driver Install

      @planedrop said in Passed Through GPU Crashes Host During Driver Install:

      Had a Panic on CPU 0 code and a reboot.

      Ok - lets do things one at a time. Can you start a new thread and provide the logs (ignore the vcpu/domain/stack hexdump log files. xca.log/xen.log/dom0.log are the interesting ones)

      posted in Compute
      A
      andyhhp
    • RE: Passed Through GPU Crashes Host During Driver Install

      @planedrop Ok, so it's a host lockup rather than a crash. That's a bit more irritating to debug.

      First of all, can you update to the debug hypervisor. Adjust the /boot/xen.gz -> $foo symlink to use the version of Xen with the -d.gz suffix. This is the same hypervisor changeset but with assertions and extra verbosity enabled.

      Also, can you append ,keep to Xen's vga= option on the command line. This should cause Xen to keep on writing out onto the screen even after dom0 has started up. Depending on the system, this might be a bit glacial, but dom0 will come up eventually.

      Then reproduce the hang. Hopefully there'll be some output from Xen before the system locks up. You might also want to consider adding noreboot to Xen's command line too, especially if there's a backtrace and you want to take a photo of it to attach here.

      posted in Compute
      A
      andyhhp