XCP-ng

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    1. Home
    2. andyhhp
    3. Best
    A
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 14
    • Best 9
    • Controversial 0
    • Groups 1

    Best posts made by andyhhp

    • RE: PCI Nvidia GPU Passthrough boot delay

      @tomg That is the work, but it needs rebasing over the XSA-400 work, so a v4 series is going to be needed at a minimum.

      HAP is Xen's vendor-neutral name for Intel EPT or AMD NPT hardware support. We have had superpage support for many years here.

      IOMMU pagetables can either be shared with EPT/NPT (reduces the memory overhead of running the VM), or split (required for AMD due to hardware incompatibilities, and also required to support migration of a VM with an IO devices).

      When pagetables are shared, the HAP superpage support gives the IOMMU superpages too (because they're literally the same set of pagetables in memory). When pagetables are split, HAP gets superpages while the IOMMU logic currently uses small pages.

      posted in Compute
      A
      andyhhp
    • RE: PCI Nvidia GPU Passthrough boot delay

      @tomg said in PCI Nvidia GPU Passthrough enable permissive?:

      It appears to be consistently 20 - 30 seconds per RTX Ampere GPU, about 20 - 25 seconds on Quadros and ~90 seconds on an A100.
      What's worse on the A100, it seems the calls are made linear so say I pass through four A100s the wait time to boot will be 4x90s, not optimal.

      These are known, and yeah - they are not great. It's an issue in Xen where the IOMMU logic doesn't (yet) support superpage mappings, so time delay you're observing is the time taken to map, unmap, and remap the GPU's massive BAR using 4k pages. (It's Qemu taking action in response to the actions of the guest.)

      The good news is that IOMMU superpage support is in progress upstream, and should turn this delay into milliseconds.

      posted in Compute
      A
      andyhhp
    • RE: XCP-ng 8.3 public alpha 🚀

      @Andrew Intel E5450, that's very retro.

      It's also first-gen VT-x and doesn't have HAP, which is why the test that is looking explicitly for HAP doesn't work.

      As a stopgap, remove hap from the VARY-CFG := hap shadow line in tests/invlpg/Makefile and rebuild. In the meantime I'll try to figure out a nice way to cope with this.

      posted in News
      A
      andyhhp
    • RE: XCP-ng 8.3 public alpha 🚀

      @olivierlambert said in XCP-ng 8.3 public alpha 🚀:

      Your Xen guru badge is well earned @andyhhp 😉

      "purveyor of general grumpiness"

      posted in News
      A
      andyhhp
    • RE: XCP-ng 8.3 public alpha 🚀

      @Andrew Those are normal.

      Bad rIP is actually an error introduced in XSA-170 because someone misread the Intel manual. I've been trying to delete it upstream for years now. Its been so long that Intel nearly released a feature which would have required us to delete that check, and I successfully persuaded the Intel documentation team to add a footnote clarifying the statement which was misinterpreted during XSA-170.

      At some point in my copious free never, I should restart the argument to delete it upstream...

      The other two are logging from the XSA-260 fix. There's an error(/misfeature) in the x86 architecture and those would have been privilege escalations before the fix was in place. I decided when fixing XSA-260 that such attempts shouldn't be entirely silent, hence the one-liner. That particular printk() is actually common with other debugging routines, so can occur during regular development.

      posted in News
      A
      andyhhp
    • RE: Non-server CPU compatibility - Ryzen and Intel

      So, we've had reports on xen-devel which look a little like this.

      @BlueBadger are you able to switch back to your 7950x and try booting Xen with x2apic_phys=true ? It appears that the -X processors are missing a feature in their IOMMU and Xen was getting confused when setting up interrupt handling.

      https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=0d2686f6b66b4b1b3c72c3525083b0ce02830054 is at least part of the fix, but so far feedback on the mailing lists suggests it's not a complete fix.

      posted in Compute
      A
      andyhhp
    • RE: PCI Passthrough of Nvidia GPU and USB add-on card

      This is way way outside of a normal-ish looking server usecase. I'm honestly surprised you've got anything to function...

      To start with, you're probably booting Xen with console=vga (because that's the default). It will be handed over to dom0 too, so start by going through the bootloader configuration and making sure that neither Xen nor dom0 are trying to use the display at all.

      I suspect this is the root cause of the display going periodically back to black.

      posted in Compute
      A
      andyhhp
    • RE: Passed Through GPU Crashes Host During Driver Install

      @planedrop said in Passed Through GPU Crashes Host During Driver Install:

      Had a Panic on CPU 0 code and a reboot.

      Ok - lets do things one at a time. Can you start a new thread and provide the logs (ignore the vcpu/domain/stack hexdump log files. xca.log/xen.log/dom0.log are the interesting ones)

      posted in Compute
      A
      andyhhp
    • RE: Passed Through GPU Crashes Host During Driver Install

      @planedrop Ok, so it's a host lockup rather than a crash. That's a bit more irritating to debug.

      First of all, can you update to the debug hypervisor. Adjust the /boot/xen.gz -> $foo symlink to use the version of Xen with the -d.gz suffix. This is the same hypervisor changeset but with assertions and extra verbosity enabled.

      Also, can you append ,keep to Xen's vga= option on the command line. This should cause Xen to keep on writing out onto the screen even after dom0 has started up. Depending on the system, this might be a bit glacial, but dom0 will come up eventually.

      Then reproduce the hang. Hopefully there'll be some output from Xen before the system locks up. You might also want to consider adding noreboot to Xen's command line too, especially if there's a backtrace and you want to take a photo of it to attach here.

      posted in Compute
      A
      andyhhp