Posts made by andSmv | XCP-ng and XO forum

andSmv

@Niall-Con Thank you! I'll take a look to that and will ping you to test on real hardware. Just need to find time (in the middle of storm right now), so it'll take one or two weeks most probably.

andSmv

Hello,

Yes, unfortunately this is a PCIe device and this is also a PCIe capability which is reported in PCI extended configuration space (offset 0x100) and not covered by standard PCI configuration access method. And visibly the driver NEEDS this cap to make the device work.

Actually there's a work in progress (very close to its end) which offers to HVM guests a QEMU emulated Q35 chipset (instead of currently emulated i440fx chipset). This chipset "provide" to guest (amongst other things) an emulated PCI-e bus, which is capable to host PCIe devices and also offers an access to PCI extended configuration space.

When this work is done we will be able to passthrough PCIe devices and provide access to guest to all PCIe caps, so normally no driver would complain about missing that.

AFIAK, "most common" PCIe caps are emulated in this future patchset, but it still will be possible that some of them are not (exotic ones).

For now, i ping @ThierryEscande to see if he can provide to you a beta version of this patches, to see if it solves your problem and if you're agree to do some tests by the same occasion

andSmv

@milch I will take a look this week and try to figure out if we can make progress on that, so you could have something to test.

andSmv

Just got the answer from Marek on that. The patches he made was tested with Intel Wifi cards and was targeting the similar issue (MSI-x table) but not the same as Coral TPU (PBA). It should be not very very difficult to extend his patches to the PBA, but unfortunately neither him (neither us) don't have this specific hardware.

The patches he made are actually upstream b2cd07a0447bfa25e96ae13e190225b61a3670cb so you can take a look at it if you want.

I will try to see if we have an easy possibility to get this HW

andSmv

I don't aware if there's something new from Marek who initially worked on these patches. I think at the time he addressed not this particular hardware but rather the global issue, and this patch wasn't tested with Coral HW, so most probably that's why it doesn't work (may be more issues...)

I will ping Marek on XEN Community Matrix channel to know if there's something new at that level and will keep you posted.

andSmv

@redakula
Well, this was unfortunately one of the potential outcome. Unfortunately we don't have the hardware to make more "in deep" debug. I will talk to Marek next week (on Xen Summit) about this patch series and if we could expect it eventually fix the issue with Coral TPU.
Will keep you posted.

andSmv

@rarturas I'm not sure it's actually doable to run Windows with more than 64 VCPUs. I'm not surprised neither you VM isn't booting when you turn ACPI off.

We're actually in the middle of investigation what's the VCPU limit could be for Windows VM and especially what could be the gap to get 128 VCPUs for Windows. We're most probably will discuss this topic with community on Xen Summit (in a couple of weeks)

Stay tuned!

andSmv

@andSmv
Hello,
I integrated Marek's patch and builded a rpm, so you can install (may be need to force rpm install or extract the xen.gz from rpm and install it manually if you prefer)

Obviously there's no guarantee, it'll work in your case. Moreover, I didn't test the patch, so please backup all your data. It should be harmless, but....

Here's the link you can download the rpm (should be operational until the end of the month) https://nextcloud.vates.fr/index.php/s/gd7kMwxHtNEP329

Don't hesitate to ping me if you experience any issue to download/install/... the patched xen.

Hope it helps!

P.S. Be sure you're running 8.3 XCP-ng, as I only uploaded xen hypervisor rpm (and not libs/tools which come within)

andSmv

@redakula I'm on it. I keep you posted.

andSmv

@redakula Hello, unfortunately these patches are not in 4.17 Xen (and was never integrated in more recent Xen). So, to test it, you have to manually apply patches (normally should apply as is to 4.17) and rebuild your Xen.

andSmv

@alexredston Hey, sorry I'm a little bit late here. So, with regard of VCPUs - there's a hardcoded limit of 128 actually in XEN hypervisor. Moreover XEN toolstack (when creating a guest) will check that guest VCPU limit is below physical CPUs available on the platform.
Bypassing the VCPU 128 limit will require some rather important adjustements in XEN Hypervisor (basically the restrictions go with IOREQ server from QEMU and how LAPIC id are affected in XEN). So with the next XEN version this limit could potentially be increased (there's an ongoing work on this).

The things you also probably would like to know about this VCPUs limit

not all of the guests can handle this VCPUs number (e.g. Windows will certainely crash)
when you gives a VM such a big VCPU number (basically more than 32) the VM can potentially provoke the DoS on the whole platform (this is related how some routines are "serialized" in XEN Hypervisor). So, when you do this - be aware that if your guest is broken, pawned, whatever... your whole platform can potentially become unresponsive.

andSmv

@RAG67958472
Thank you!

This seems to be a bug. On some level when mapping NVMe device MMIO frame ef004 (BAR 4) it reuse the same guest frame where SATA device MMIO frame ef138 (BAR 1) is allready mapped. This is failing so the domain is stopped by XEN.

I have no idea what part of code is responsible of reusing the same guest frame (gfn) for this mappings (probably in toolstack/QEMU,....).

So at first it will be usefull have whole XEN traces from domU start (If I understand correctly you have XEN start in your traces and also the bug traces from 2 times you tried to launch the domU). Are these the only traces when you launch domU?

The second thing - it would be nice to start XEN in debug mode (normally you have XEN image builded with debug traces activated) Can you please start this image and provide these traces.

I will talk to XEN maintainers to see if the problem was allready reported by users. (The code wich stop the domain didn't change in most recent XEN, but the issue is probably situated in upper layers)

It would be also very usefull to see if the problem is the same with pvh and hvm guests.

andSmv

@RAG67958472
Hmmm, seems to be a bug. There's someting special about machine frame ef004. I suppose it's a MMIO address (PCI bar?). Can you please provide an output of the whole xen log from the beggining with xl dmesg and also a pci conf space dump lspci -vvv ?
What is weird you can passthrough the both devices individually.

andSmv

Hello, I'm honestly don't know how Citrix vGPU stuff works, but couple of thoughts on this topic:

If I understand correctly, you say Nvidia use VFIO Linux framework to enable mediated devices which can be exported to guest. The VFIO framework isn't supported by XEN, as VFIO need the presense of IOMMU device managed by IOMMU Linux kernel driver. And XEN doesn't proide/virtualize IOMMU access to dom0 (XEN manages IOMMU by itself, but doesn't offer such access to guests)

Bascally to export SR-IOV virtual function to guest with XEN you don't have to use VFIO, you can just assign the virtual function PCI "bdf" id to guest and normally the guest should see this device.

From what I understand Nvidia user-mode toolstack (scripts & binaries) doesn't JUST create SR-IOV virtual functions, but want to access VFIO/MDEV framework, so all this thing fails.

So may be, you can check if you there's some options with Nvidia tools to just create SR-IOV functions, OR try to run VFIO in "no-iommu" mode (no IOMMU presence in Linux kernel required)

BTW, we working on some project where we are intending to use VFIO with dom0, and so we're implementing the IOMMU driver in dom0 kernel, so it would be interesting to know in the future, if this can help with your case.

Hope this help

andSmv

It's obviously is not exluded that the issue is related to the memory footprint. Moreover the first warning "complains" about failure on memory allocation. (I suppose that the "receiver" node has enough memory to host the VM).

Normally XEN hasn't limitations on Live Migration 24GB VM. So, it's difficult to say what's the issue here. But clearly there's a possibity that this is a bug in XEN/toolstack... Memory fragmentation on the receiver" node can be an issue too.

You can probably run some different configurations to try to pinpoint this issue.
May be for the start try to migrate a VM when no other VMs are running on the "receiver" node. Also try to migrate a VM with no network connections (as the issue seems to be related to network backend status changes)....

andSmv

Yeah, The HW problem seems to be a good guess.

The track that we can follow here is xen_mc_flush kernel function which raises a warning when a multicall (hypercall wrapper) fails. The interesting thing here would be to take a look at XEN traces. You can type xl dmesg in dom0 to see if XEN tells something more (if it isn't happy on some reason)

andSmv

Hmmm, there's two poblems here (page alloc failure warning and NULL pointer BUG) in context of xenwatch kernel thread and basically both of them happenning when configuring XEN network frontend/backend communications.

Normally this isn't related to memory footprint of the VM, but rather to XEN frontend/backend xenbus communication framework. Does the bugs desappear when you reduce the memory size for the VM and when all others params/environnement are the same?

andSmv

@jjgg Here's the link to xen.gz.

You need to put it in your /boot folder (backup your existent file!) and make sure your grub.cfg is pointing to it.

But first: Backup all you want to backup! The patch is totally untested and doesn't apply as is (so I needed to adapt it). Normally not such a big deal and should not do no harm, but... you never know.

I'm also not sure that the issue would be fixed. We unfortunatelly do not have Coral TPU device at Vates, so we can't do the more deep analysis on this. The guy who wrote this patch tried to fix other device.

@exime - this is 4.13.5 XCP-ng patched xen, so there's chances it wouldn't work for you (from what I saw you're running 4.13.4 xen)

Anyway, if we have good news, we'll find the way to fix it for everybody.

andSmv

@jjgg Thank you. Yes the same problem - ept violation.. Look, I'll try to figure out what we can do here. There's a patch that comes from Qubes OS guys that normally shold fix the MSI-x PBA issue (not sure that this is the good fix, but still... worth trying) This patch applies on recent Xen and wasn't accepted yet. I will take a look if it can be easily backported to XCP-ng Xen and come back to you.

andSmv

@jjgg Can you please also post XEN traces after the VM is stopped.
(either in hypervisor.log or just type xl dmesg (under root account in your dom0)