XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. andSmv
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 37
    • Groups 4

    andSmv

    @andSmv

    Vates 🪐 XCP-ng Team Xen Guru
    19
    Reputation
    56
    Profile views
    37
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    andSmv Unfollow Follow
    Hypervisor & Kernel Team Xen Guru Vates 🪐 XCP-ng Team

    Best posts made by andSmv

    • RE: Coral TPU PCI Passthrough

      @redakula
      Well, this was unfortunately one of the potential outcome. Unfortunately we don't have the hardware to make more "in deep" debug. I will talk to Marek next week (on Xen Summit) about this patch series and if we could expect it eventually fix the issue with Coral TPU.
      Will keep you posted.

      posted in Compute
      andSmvA
      andSmv
    • RE: XCP-ng 8.2.1 crash

      Hello, both issues seem to be related to memory corruption.

      • The first trace is an #NMI exception (one of the causes can be a parity error detected by the HW). Moreover, CPU#12 gets the #MC(machine check) exception. The #MC is triggered by the HW to notify the system software that there's an unrecoverable issue with the HW.
      • The second one is the invalid opcode in the Xen Hypervisor context. So it means that either the instruction flow is corrupted, or the instruction pointer is corrupted.

      My hypothesis is:

      In the first case - the ECC memory error is detected (and reported by HW) which makes the hypervisor panic and stop

      In the second case - the memory error is not detected (but the memory is still corrupted) but at some point, this corruption provokes the same result on the Xen hypervisor.

      Can you look with Hetzner guys if there's a way to change memory modules?

      The other way to validate this hypothesis is to install a different system software (another OS/hypervisor, another version of hypervisor) and see if you experience the same issue.

      You can also add on Xen command line "ler=true" option. This can give us more traces (leveraged by HW) to check if there's nothing abnormal on software level. I'll probably will need your Xen image with its symbole table (xen-syms-XXX and xen-syms-XXX.map)

      posted in Compute
      andSmvA
      andSmv
    • RE: PCI Passthrough Missing Capabilities in Guest

      Hello,

      Yes, unfortunately this is a PCIe device and this is also a PCIe capability which is reported in PCI extended configuration space (offset 0x100) and not covered by standard PCI configuration access method. And visibly the driver NEEDS this cap to make the device work.

      Actually there's a work in progress (very close to its end) which offers to HVM guests a QEMU emulated Q35 chipset (instead of currently emulated i440fx chipset). This chipset "provide" to guest (amongst other things) an emulated PCI-e bus, which is capable to host PCIe devices and also offers an access to PCI extended configuration space.

      When this work is done we will be able to passthrough PCIe devices and provide access to guest to all PCIe caps, so normally no driver would complain about missing that.

      AFIAK, "most common" PCIe caps are emulated in this future patchset, but it still will be possible that some of them are not (exotic ones).

      For now, i ping @ThierryEscande to see if he can provide to you a beta version of this patches, to see if it solves your problem and if you're agree to do some tests by the same occasion 🙂

      posted in Hardware
      andSmvA
      andSmv
    • RE: XCP-ng 8.3 public alpha 🚀

      @ashceryth XCP-ng is based on Xen 4.13, so I'm quite sure it doesn't handle Intel Hybrid architecture. I'm not even sure there's ongoing efforts on this support in Xen Project Community.

      Moreover, after a very quick check, I didn't see the trace of ARM big.LITTLE support in recent XEN.

      I think this kind of features needs the profound analysis how exactly is to be mapped on hypervisor based platforms. And I think the response is not obvious at all.

      posted in News
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @milch I will take a look this week and try to figure out if we can make progress on that, so you could have something to test.

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      I don't aware if there's something new from Marek who initially worked on these patches. I think at the time he addressed not this particular hardware but rather the global issue, and this patch wasn't tested with Coral HW, so most probably that's why it doesn't work (may be more issues...)

      I will ping Marek on XEN Community Matrix channel to know if there's something new at that level and will keep you posted.

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @redakula I'm on it. I keep you posted.

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @redakula Hello, unfortunately these patches are not in 4.17 Xen (and was never integrated in more recent Xen). So, to test it, you have to manually apply patches (normally should apply as is to 4.17) and rebuild your Xen.

      posted in Compute
      andSmvA
      andSmv
    • RE: Google Coral TPU PCIe Passthrough Woes

      @jjgg Thank you. Yes the same problem - ept violation.. Look, I'll try to figure out what we can do here. There's a patch that comes from Qubes OS guys that normally shold fix the MSI-x PBA issue (not sure that this is the good fix, but still... worth trying) This patch applies on recent Xen and wasn't accepted yet. I will take a look if it can be easily backported to XCP-ng Xen and come back to you.

      posted in Compute
      andSmvA
      andSmv
    • RE: PCI Passthrough of Nvidia GPU and USB add-on card

      Yes. Some of the PCI capabilities are beyond the "standard" PCI configuration space of 256 bytes per BDF (PCI device). And unfortunatly the "enhanced" configuration access method is not provided yet (it's ongoing work) for HVM guests by XEN. It would require from QEMU (xen related part) the chipset emulation which offers an access to such method, such as Q35.

      Very probably, windows drivers for these devices are not happy to not access these fields, so this is potentially the reason of malfunctionning for these devices.

      The good way to confirm this would be to try to passthrough these devices to Linux guests, so we could possibly add some extended traces. And possibly passthrough these devices to PVH Linux guest and see how they are handled (PVH guest do not use QEMU for PCI bus emulation)

      posted in Compute
      andSmvA
      andSmv

    Latest posts made by andSmv

    • RE: Coral TPU PCI Passthrough

      @Niall-Con Thank you! I'll take a look to that and will ping you to test on real hardware. Just need to find time (in the middle of storm right now), so it'll take one or two weeks most probably.

      posted in Compute
      andSmvA
      andSmv
    • RE: PCI Passthrough Missing Capabilities in Guest

      Hello,

      Yes, unfortunately this is a PCIe device and this is also a PCIe capability which is reported in PCI extended configuration space (offset 0x100) and not covered by standard PCI configuration access method. And visibly the driver NEEDS this cap to make the device work.

      Actually there's a work in progress (very close to its end) which offers to HVM guests a QEMU emulated Q35 chipset (instead of currently emulated i440fx chipset). This chipset "provide" to guest (amongst other things) an emulated PCI-e bus, which is capable to host PCIe devices and also offers an access to PCI extended configuration space.

      When this work is done we will be able to passthrough PCIe devices and provide access to guest to all PCIe caps, so normally no driver would complain about missing that.

      AFIAK, "most common" PCIe caps are emulated in this future patchset, but it still will be possible that some of them are not (exotic ones).

      For now, i ping @ThierryEscande to see if he can provide to you a beta version of this patches, to see if it solves your problem and if you're agree to do some tests by the same occasion 🙂

      posted in Hardware
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @milch I will take a look this week and try to figure out if we can make progress on that, so you could have something to test.

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      Just got the answer from Marek on that. The patches he made was tested with Intel Wifi cards and was targeting the similar issue (MSI-x table) but not the same as Coral TPU (PBA). It should be not very very difficult to extend his patches to the PBA, but unfortunately neither him (neither us) don't have this specific hardware.

      The patches he made are actually upstream b2cd07a0447bfa25e96ae13e190225b61a3670cb so you can take a look at it if you want.

      I will try to see if we have an easy possibility to get this HW

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      I don't aware if there's something new from Marek who initially worked on these patches. I think at the time he addressed not this particular hardware but rather the global issue, and this patch wasn't tested with Coral HW, so most probably that's why it doesn't work (may be more issues...)

      I will ping Marek on XEN Community Matrix channel to know if there's something new at that level and will keep you posted.

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @redakula
      Well, this was unfortunately one of the potential outcome. Unfortunately we don't have the hardware to make more "in deep" debug. I will talk to Marek next week (on Xen Summit) about this patch series and if we could expect it eventually fix the issue with Coral TPU.
      Will keep you posted.

      posted in Compute
      andSmvA
      andSmv
    • RE: More than 64 vCPU on Debian11 VM and AMD EPYC

      @rarturas I'm not sure it's actually doable to run Windows with more than 64 VCPUs. I'm not surprised neither you VM isn't booting when you turn ACPI off.

      We're actually in the middle of investigation what's the VCPU limit could be for Windows VM and especially what could be the gap to get 128 VCPUs for Windows. We're most probably will discuss this topic with community on Xen Summit (in a couple of weeks)

      Stay tuned! 🙂

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @andSmv
      Hello,
      I integrated Marek's patch and builded a rpm, so you can install (may be need to force rpm install or extract the xen.gz from rpm and install it manually if you prefer)

      Obviously there's no guarantee, it'll work in your case. Moreover, I didn't test the patch, so please backup all your data. It should be harmless, but....

      Here's the link you can download the rpm (should be operational until the end of the month) https://nextcloud.vates.fr/index.php/s/gd7kMwxHtNEP329

      Don't hesitate to ping me if you experience any issue to download/install/... the patched xen.

      Hope it helps!

      P.S. Be sure you're running 8.3 XCP-ng, as I only uploaded xen hypervisor rpm (and not libs/tools which come within)

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @redakula I'm on it. I keep you posted.

      posted in Compute
      andSmvA
      andSmv
    • RE: Coral TPU PCI Passthrough

      @redakula Hello, unfortunately these patches are not in 4.17 Xen (and was never integrated in more recent Xen). So, to test it, you have to manually apply patches (normally should apply as is to 4.17) and rebuild your Xen.

      posted in Compute
      andSmvA
      andSmv