Google Coral TPU PCIe Passthrough Woes
-
-
@andSmv thanks. Completely understand this appears more of a hardware issue here, but happy to test anything. The host that these cards are installed in isn't running critical infrastructure and can be rebooted relatively easily.
@exime your initial post was a top Google result, was still helpful! Thanks.
-
@jjgg Here's the link to
xen.gz
.You need to put it in your
/boot
folder (backup your existent file!) and make sure your grub.cfg is pointing to it.But first: Backup all you want to backup! The patch is totally untested and doesn't apply as is (so I needed to adapt it). Normally not such a big deal and should not do no harm, but... you never know.
I'm also not sure that the issue would be fixed. We unfortunatelly do not have Coral TPU device at Vates, so we can't do the more deep analysis on this. The guy who wrote this patch tried to fix other device.
@exime - this is 4.13.5 XCP-ng patched xen, so there's chances it wouldn't work for you (from what I saw you're running 4.13.4 xen)
Anyway, if we have good news, we'll find the way to fix it for everybody.
-
-
Ok so testing setup.
Downloaded xen.gz, renamed to xenept.gz and put into /boot:
Updated grub.cfg to point to that file:
Rebooted host.
Got to the grub screen, let it load as normal, as soon as it disappeared (so it had made the default selection) the server rebooted.
I ended up just placing xen.gz in /boot and removing the symbolic link to the original and attempting, no difference.
Of note, messing with kernels / grub is not something I've got experience with. I may have made a mistake / need things explained in a bit more detail if I'm potentially misunderstood some instructions above.
-
Booted into fallback and put things back the way they were. Happy to keep testing if there's additional bits to test.
-
Ah, just found these things exist..... Then just found this issue exists too ️
-
If only they could have done PCI hardware that follow the PCI specifications
-
Maybe this one will come to life again https://xcp-ng.org/forum/topic/7066/coral-tpu-pci-passthrough/14
Don't really want to buy one knowing its not working!!
-
Definitely frustrating and no fault of xcp-ng - I have a lot of spare cpu cycles so it isn't majorly impacting me that I know of. I'm still available to test fixes though.
Looks like most of the Proxmox users have got this working in an LXC container by installing the drivers on the host itself and passing through the actual Apex devices. Not a route that's applicable to us but just a datapoint.
-
@jjgg it would be great if we could get this working. My CPU utilisation is fine too, but when I shut down my Zoneminder VM things go a lot quieter (fans) so I'm sure there would be a benefit CPU and power wise.
-
@jmccoy555 // @jjgg
Did anyone of you get your Coral USB TPU working and passthrough to a VM?
-
@Nornode hey, nope I did not. I ended up moving my infrastructure to Proxmox.
Honestly this is no fault of XCP-ng and XCP-ng suits my hardware / setup a lot better, but it was either that or I had two servers that needed to be bare metal.
-
PCI passthrough might cause problems with this device, but USB could work.
-
@olivierlambert Seems like a reasonable place to ask as any - I am currently using a USB Coral over IP (Virtualhere) but would rather load it into my VM directly - what's the current status of snapshots/backups with a vUSB?
I've been reading that XO can now support disk exclusions with
[NOBAK]
but this probably doesn't apply to a Coral. Is an offline backup still the best available method? -
For NOBAK and on 8.3 yes, but I'm not sure it will be related to USB. You should use offline, that should work. Alternatively, we have plans to detect the error, to unplug the USB device, do the snap and replug it just after.
-
@olivierlambert Thanks that's good to know. That functionality would be great down the line!
I do have a spare M.2 E-key on my XCP host running the VM Coral is needed for, but seems like I'd have trouble going by this thread. Might even have trouble with the USB Coral, it hasn't been much better so far in terms of whacky non-standard behavior...