Coral TPU PCI Passthrough
-
@andSmv
As expected the VM with the coral m2 crashes on boot.Where would i start with building a custom Xen? The Koji docs seem directed at authorized package maintainers so would i need to build the sources directly from Xen?
Feeling old admitting it was in the 2.6 days i last regularly built custom kernels -
@redakula I'm on it. I keep you posted.
-
@andSmv
Hello,
I integrated Marek's patch and builded a rpm, so you can install (may be need to force rpm install or extract the xen.gz from rpm and install it manually if you prefer)Obviously there's no guarantee, it'll work in your case. Moreover, I didn't test the patch, so please backup all your data. It should be harmless, but....
Here's the link you can download the rpm (should be operational until the end of the month) https://nextcloud.vates.fr/index.php/s/gd7kMwxHtNEP329
Don't hesitate to ping me if you experience any issue to download/install/... the patched xen.
Hope it helps!
P.S. Be sure you're running 8.3 XCP-ng, as I only uploaded xen hypervisor rpm (and not libs/tools which come within)
-
@andSmv Thanks!
I tried to be as uninvasive as possible and changed the symbolic link xen.gz to point to the xen.gz from the RPM you created.
Unfortunately still the same error (It does seem to boot the xen from the RPM as this has version 4.17.3-3 vs. the one currently in the repos which has version 4.17.3-4).
[2024-05-24 17:06:33] (XEN) [ 674.051176] Domain 14 (vcpu#2) crashed on cpu#22: [2024-05-24 17:06:33] (XEN) [ 674.051178] ----[ Xen-4.17.3-3 x86_64 debug=n Not tainted ]---- [2024-05-24 17:06:33] (XEN) [ 674.051179] CPU: 22 [2024-05-24 17:06:33] (XEN) [ 674.051180] RIP: 0010:[<ffffffffa8581584>] [2024-05-24 17:06:33] (XEN) [ 674.051180] RFLAGS: 0000000000000286 CONTEXT: hvm guest (d14v2) [2024-05-24 17:06:33] (XEN) [ 674.051182] rax: ffffbd9c00149800 rbx: ffff9e9247cc9000 rcx: 0000000000000000 [2024-05-24 17:06:33] (XEN) [ 674.051182] rdx: 00000000fee77000 rsi: 0000000000000000 rdi: 0000000000000000 [2024-05-24 17:06:33] (XEN) [ 674.051183] rbp: ffffbd9c00327690 rsp: ffffbd9c00327658 r8: 0000000000000000 [2024-05-24 17:06:33] (XEN) [ 674.051183] r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 [2024-05-24 17:06:33] (XEN) [ 674.051184] r12: ffffbd9c003276ac r13: 0000000000000011 r14: ffff9e92413390c0 [2024-05-24 17:06:33] (XEN) [ 674.051185] r15: 0000000000000077 cr0: 0000000080050033 cr4: 0000000000750ef0 [2024-05-24 17:06:33] (XEN) [ 674.051185] cr3: 0000000103806000 cr2: 0000000000000000 [2024-05-24 17:06:33] (XEN) [ 674.051186] fsb: 00007b6e7a42a8c0 gsb: ffff9e925b500000 gss: 0000000000000000 [2024-05-24 17:06:33] (XEN) [ 674.051186] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010
It does appear that there is some movement upstream on this (if i interpret the xen mailing list correctly).
This patch series references the same title as the patch in this thread from 2022 and a bunch of other related work:
https://lore.kernel.org/xen-devel/cover.33fb4385b7dd6c53bda4acf0a9e91748b3d7b1f7.1715313192.git-series.marmarek@invisiblethingslab.com/ -
@redakula
Well, this was unfortunately one of the potential outcome. Unfortunately we don't have the hardware to make more "in deep" debug. I will talk to Marek next week (on Xen Summit) about this patch series and if we could expect it eventually fix the issue with Coral TPU.
Will keep you posted. -
Thanks
Let me know and i will be happy to continue testing