Coral TPU PCI Passthrough
-
Anyone ever get PCI passthrough to work for Coral TPU?
After enabling the passthrough the Debian 11 VM won't start and the log shows:
unix enoent open error.[12:48 xcp-lab0 ~]# lspci -vvv -s 04:00.0 04:00.0 Non-VGA unclassified device: Global Unichip Corp. Coral Edge TPU (prog-if ff) Subsystem: Global Unichip Corp. Coral Edge TPU Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 255 Region 0: Memory at 4000100000 (64-bit, prefetchable) [disabled] [size=16K] Region 2: Memory at 4000000000 (64-bit, prefetchable) [disabled] [size=1M] Capabilities: [80] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [d0] MSI-X: Enable- Count=128 Masked- Vector table: BAR=2 offset=00046800 PBA: BAR=2 offset=00046068 Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [f8] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?> Capabilities: [108 v1] Latency Tolerance Reporting Max snoop latency: 3145728ns Max no snoop latency: 3145728ns Capabilities: [110 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us Capabilities: [200 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- [12:48 xcp-lab0 ~]#
-
I've read that those TPU are breaking the PCI specification, and therefore having issues when you do PCI passthrough with them. Maybe it was on the forum, last year or two
-
-
@olivierlambert I saw that post in my initial search but it doesn't look like the OP replied with the PCI dump. Is there any hope for a workaround?
-
You can continue here to provide details, maybe we'll see something obvious
-
@olivierlambert Aside from the dump in my original post would you like me to run any additional commands to gather more data?
-
@andSmv will take a look when he's around
-
What's your exact model of Coral TPU by the way?
-
-
@olivierlambert M.2 Accelerator B+M Key
https://coral.ai/products/m2-accelerator-bm#description -
So I've heard Qubes OS people did some patches to workaround the broken PCI spec for the device, I need to ask around more details about this.
-
@andSmv I discussed with Marek from Qubes, he told me that might be relevant (or not): https://lore.kernel.org/xen-devel/20221114192100.1539267-2-marmarek@invisiblethingslab.com/
What do you think?
-
Hello, sorry for late response (just discovered the topic)
With regards of Marek patches, I'm actually think it can worth a try (at least the patch seems to treat the problem where MSI-x PBA page is shared with other regs of the device), but there's some cons too:
- the patches are quite new (doesn't seems to be integrated yet).
- the patches can be applied to more recent Xen (not XCP-ng Xen), and even we could probably backport them, it potentially will require some significant work
- we are not 100% sure it's the issue (or the only issue)
So If this is a must have, we can go and do some digging to make it work (but still in the scope of "exeperimental" platform, not the production platform)
-
We could probably try on a non-XCP-ng platform with a very recent "vanilla" Xen (+Marek patches) and see if it's fixed. If it is, then we could think about a potential backport when 8.3 will include a more recent Xen version
-
@logical-systems I will check which Xen version the patches are easily applied and If you want I could give you a hand (if needed) to build and install your builded XEN, so you can test if this resolve your issue.
Unfortunatly we don't have the related HW (Coral TPU) to test it by ourselves.
UPDATE: the both patches apply to xen 4.17 (tag RELEASE-4.17.0)
-
-
Hi,
I'm researching XCP-NG as an alternative to my homelab VMware hypervisor.
A goal for me is to get proper USB passthrough of the Google Coral TPU.Did these patches make it work so passthrough to a VM is confirmed to be working?
-
@andSmv said in Coral TPU PCI Passthrough:
@logical-systems I will check which Xen version the patches are easily applied and If you want I could give you a hand (if needed) to build and install your builded XEN, so you can test if this resolve your issue.
Unfortunatly we don't have the related HW (Coral TPU) to test it by ourselves.
UPDATE: the both patches apply to xen 4.17 (tag RELEASE-4.17.0)
So the above mentioned patches are included in the 4.17 that is currently available as a test version?
Or did you mean the patches worked on that version?
-
Now we have Xen 4.17 in XCP-ng 8.3, that might work (ping @andSmv )
-
@redakula Hello, unfortunately these patches are not in 4.17 Xen (and was never integrated in more recent Xen). So, to test it, you have to manually apply patches (normally should apply as is to 4.17) and rebuild your Xen.
-
@andSmv
Damn - i was quick and have a coral m2 A+E coming in a few daysIt's just for fun/learning so as long as it doesn't break my homelab too much i will be willing to test so we might get it included
Already using the 4.17 test version without a hitch since it came out.