andSmv

andSmv

redakula
Well, this was unfortunately one of the potential outcome. Unfortunately we don't have the hardware to make more "in deep" debug. I will talk to Marek next week (on Xen Summit) about this patch series and if we could expect it eventually fix the issue with Coral TPU.
Will keep you posted.

andSmv

Hello, both issues seem to be related to memory corruption.

The first trace is an #NMI exception (one of the causes can be a parity error detected by the HW). Moreover, CPU#12 gets the #MC(machine check) exception. The #MC is triggered by the HW to notify the system software that there's an unrecoverable issue with the HW.
The second one is the invalid opcode in the Xen Hypervisor context. So it means that either the instruction flow is corrupted, or the instruction pointer is corrupted.

My hypothesis is:

In the first case - the ECC memory error is detected (and reported by HW) which makes the hypervisor panic and stop

In the second case - the memory error is not detected (but the memory is still corrupted) but at some point, this corruption provokes the same result on the Xen hypervisor.

Can you look with Hetzner guys if there's a way to change memory modules?

The other way to validate this hypothesis is to install a different system software (another OS/hypervisor, another version of hypervisor) and see if you experience the same issue.

You can also add on Xen command line "ler=true" option. This can give us more traces (leveraged by HW) to check if there's nothing abnormal on software level. I'll probably will need your Xen image with its symbole table (xen-syms-XXX and xen-syms-XXX.map)

andSmv

ashceryth XCP-ng is based on Xen 4.13, so I'm quite sure it doesn't handle Intel Hybrid architecture. I'm not even sure there's ongoing efforts on this support in Xen Project Community.

Moreover, after a very quick check, I didn't see the trace of ARM big.LITTLE support in recent XEN.

I think this kind of features needs the profound analysis how exactly is to be mapped on hypervisor based platforms. And I think the response is not obvious at all.

andSmv

redakula I'm on it. I keep you posted.

andSmv

redakula Hello, unfortunately these patches are not in 4.17 Xen (and was never integrated in more recent Xen). So, to test it, you have to manually apply patches (normally should apply as is to 4.17) and rebuild your Xen.

andSmv

jjgg Thank you. Yes the same problem - ept violation.. Look, I'll try to figure out what we can do here. There's a patch that comes from Qubes OS guys that normally shold fix the MSI-x PBA issue (not sure that this is the good fix, but still... worth trying) This patch applies on recent Xen and wasn't accepted yet. I will take a look if it can be easily backported to XCP-ng Xen and come back to you.

andSmv

Yes. Some of the PCI capabilities are beyond the "standard" PCI configuration space of 256 bytes per BDF (PCI device). And unfortunatly the "enhanced" configuration access method is not provided yet (it's ongoing work) for HVM guests by XEN. It would require from QEMU (xen related part) the chipset emulation which offers an access to such method, such as Q35.

Very probably, windows drivers for these devices are not happy to not access these fields, so this is potentially the reason of malfunctionning for these devices.

The good way to confirm this would be to try to passthrough these devices to Linux guests, so we could possibly add some extended traces. And possibly passthrough these devices to PVH Linux guest and see how they are handled (PVH guest do not use QEMU for PCI bus emulation)

andSmv

Hello, sorry for late response (just discovered the topic)

With regards of Marek patches, I'm actually think it can worth a try (at least the patch seems to treat the problem where MSI-x PBA page is shared with other regs of the device), but there's some cons too:

the patches are quite new (doesn't seems to be integrated yet).
the patches can be applied to more recent Xen (not XCP-ng Xen), and even we could probably backport them, it potentially will require some significant work
we are not 100% sure it's the issue (or the only issue)

So If this is a must have, we can go and do some digging to make it work (but still in the scope of "exeperimental" platform, not the production platform)

andSmv

Hello phipra,
Sorry for late response (didn't see this earlier)

Well, this is the tricky one.

The triple fault could normally be two things:

heavy memory corruption (the IDT was corrupted)
the normal reboot (if ACPI reboot is not available)

My question is - can this Windows reboot be a planified Windows reboot (for example related with Windows update mechanism, or something like that ....)?

It can be obviously be a memory corruption (and possibly done by citrix-vm-tools drivers), but it'll be VERY hard to debug this from a memory dump (and actually AFAIK xen doesn't provide a guest memory dump).

My suggestions would be - try to enable/disable some Windows Services (disable the Update?) to see if there's some changes.

Sorry for this very poor insight, but this is related to my rather poor knowledge about Windows OS.

andSmv

Hello exime,

Here we have the EPT violation (write access to the r-x page) at 0x3fff2046. This address is tagged as an MMIO address, so very probably belongs to the device you're trying to passthrough.

Normally this has nothing to do with the 0x46xxx range (where MSI-X caps are pointing)? But the fact that there's some hacking in there make me think that Google engs also did some hacking all the place around.

Can you please while starting a native guest (or in dom0 before the passthrough) give a PCI dump for your device

lspci -vvv -s $YOUR_DEV_BDF

YOUR_DEV_BDF is your device PCI id (ex: 00:1:0)

andSmv

redakula
Well, this was unfortunately one of the potential outcome. Unfortunately we don't have the hardware to make more "in deep" debug. I will talk to Marek next week (on Xen Summit) about this patch series and if we could expect it eventually fix the issue with Coral TPU.
Will keep you posted.

andSmv

rarturas I'm not sure it's actually doable to run Windows with more than 64 VCPUs. I'm not surprised neither you VM isn't booting when you turn ACPI off.

We're actually in the middle of investigation what's the VCPU limit could be for Windows VM and especially what could be the gap to get 128 VCPUs for Windows. We're most probably will discuss this topic with community on Xen Summit (in a couple of weeks)

Stay tuned!

andSmv

andSmv
Hello,
I integrated Marek's patch and builded a rpm, so you can install (may be need to force rpm install or extract the xen.gz from rpm and install it manually if you prefer)

Obviously there's no guarantee, it'll work in your case. Moreover, I didn't test the patch, so please backup all your data. It should be harmless, but....

Here's the link you can download the rpm (should be operational until the end of the month) https://nextcloud.vates.fr/index.php/s/gd7kMwxHtNEP329

Don't hesitate to ping me if you experience any issue to download/install/... the patched xen.

Hope it helps!

P.S. Be sure you're running 8.3 XCP-ng, as I only uploaded xen hypervisor rpm (and not libs/tools which come within)

andSmv

redakula I'm on it. I keep you posted.

andSmv

redakula Hello, unfortunately these patches are not in 4.17 Xen (and was never integrated in more recent Xen). So, to test it, you have to manually apply patches (normally should apply as is to 4.17) and rebuild your Xen.

andSmv

alexredston Hey, sorry I'm a little bit late here. So, with regard of VCPUs - there's a hardcoded limit of 128 actually in XEN hypervisor. Moreover XEN toolstack (when creating a guest) will check that guest VCPU limit is below physical CPUs available on the platform.
Bypassing the VCPU 128 limit will require some rather important adjustements in XEN Hypervisor (basically the restrictions go with IOREQ server from QEMU and how LAPIC id are affected in XEN). So with the next XEN version this limit could potentially be increased (there's an ongoing work on this).

The things you also probably would like to know about this VCPUs limit

not all of the guests can handle this VCPUs number (e.g. Windows will certainely crash)
when you gives a VM such a big VCPU number (basically more than 32) the VM can potentially provoke the DoS on the whole platform (this is related how some routines are "serialized" in XEN Hypervisor). So, when you do this - be aware that if your guest is broken, pawned, whatever... your whole platform can potentially become unresponsive.

andSmv

RAG67958472
Thank you!

This seems to be a bug. On some level when mapping NVMe device MMIO frame ef004 (BAR 4) it reuse the same guest frame where SATA device MMIO frame ef138 (BAR 1) is allready mapped. This is failing so the domain is stopped by XEN.

I have no idea what part of code is responsible of reusing the same guest frame (gfn) for this mappings (probably in toolstack/QEMU,....).

So at first it will be usefull have whole XEN traces from domU start (If I understand correctly you have XEN start in your traces and also the bug traces from 2 times you tried to launch the domU). Are these the only traces when you launch domU?

The second thing - it would be nice to start XEN in debug mode (normally you have XEN image builded with debug traces activated) Can you please start this image and provide these traces.

I will talk to XEN maintainers to see if the problem was allready reported by users. (The code wich stop the domain didn't change in most recent XEN, but the issue is probably situated in upper layers)

It would be also very usefull to see if the problem is the same with pvh and hvm guests.

andSmv

RAG67958472
Hmmm, seems to be a bug. There's someting special about machine frame ef004. I suppose it's a MMIO address (PCI bar?). Can you please provide an output of the whole xen log from the beggining with xl dmesg and also a pci conf space dump lspci -vvv ?
What is weird you can passthrough the both devices individually.

andSmv

Hello, I'm honestly don't know how Citrix vGPU stuff works, but couple of thoughts on this topic:

If I understand correctly, you say Nvidia use VFIO Linux framework to enable mediated devices which can be exported to guest. The VFIO framework isn't supported by XEN, as VFIO need the presense of IOMMU device managed by IOMMU Linux kernel driver. And XEN doesn't proide/virtualize IOMMU access to dom0 (XEN manages IOMMU by itself, but doesn't offer such access to guests)

Bascally to export SR-IOV virtual function to guest with XEN you don't have to use VFIO, you can just assign the virtual function PCI "bdf" id to guest and normally the guest should see this device.

From what I understand Nvidia user-mode toolstack (scripts & binaries) doesn't JUST create SR-IOV virtual functions, but want to access VFIO/MDEV framework, so all this thing fails.

So may be, you can check if you there's some options with Nvidia tools to just create SR-IOV functions, OR try to run VFIO in "no-iommu" mode (no IOMMU presence in Linux kernel required)

BTW, we working on some project where we are intending to use VFIO with dom0, and so we're implementing the IOMMU driver in dom0 kernel, so it would be interesting to know in the future, if this can help with your case.

Hope this help

andSmv

It's obviously is not exluded that the issue is related to the memory footprint. Moreover the first warning "complains" about failure on memory allocation. (I suppose that the "receiver" node has enough memory to host the VM).

Normally XEN hasn't limitations on Live Migration 24GB VM. So, it's difficult to say what's the issue here. But clearly there's a possibity that this is a bug in XEN/toolstack... Memory fragmentation on the receiver" node can be an issue too.

You can probably run some different configurations to try to pinpoint this issue.
May be for the start try to migrate a VM when no other VMs are running on the "receiver" node. Also try to migrate a VM with no network connections (as the issue seems to be related to network backend status changes)....

andSmv

@andSmv

Best posts made by andSmv

Latest posts made by andSmv