XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. exime
    3. Posts
    E
    • Profile
    • Following 0
    • Followers 0
    • Topics 1
    • Posts 5
    • Best 1
    • Controversial 0
    • Groups 0

    Posts made by exime

    • RE: Google Coral TPU PCIe Passthrough Woes

      @andSmv ack - I'll wait and see if it works out for @jjgg since my Xen server is in active use

      posted in Compute
      E
      exime
    • RE: Google Coral TPU PCIe Passthrough Woes

      @andSmv thanks!

      @jjgg glad you're providing the info, sorry for abandoning the thread

      posted in Compute
      E
      exime
    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert said in Google Coral TPU PCIe Passthrough Woes:

      Can you try an on older kernel in your VM just to be sure? (Eg Debian 10 guest with default bundled kernel)

      I'm just now getting back to this.

      Might the problem be related to this issue?

      "Unfortunately the device in question violates PCI specification by mapping PBA, MSI-X vector table, and other registers into same 4KB page (PBA is at 0x46068, VT at 0x46800, but there is a bunch of other registers in 0x46XXX range)."

      https://github.com/google-coral/edgetpu/issues/343#issuecomment-1287251821

      dakota created this issue in google-coral/edgetpu

      open Apex failing with error -110 (No /dev/apex_0) #343

      posted in Compute
      E
      exime
    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert will do!

      posted in Compute
      E
      exime
    • Google Coral TPU PCIe Passthrough Woes

      I recently moved to XCP-ng because I had been unable to get the Google Coral TPU to pass through properly to VMs in ESXi. Unfortunately, passing the Coral through in XCP-ng results in another, different failure, with the guest VM crashing as soon as I install the Google drivers that work fine on bare metal installs of the same Ubuntu 20.04 guest.

      The TPU does show up with lspci in the guest, and I've successfully passed through a GPU and a USB controller to different guests.

      I'm not sure what the most relevant logs are, but this is what I see in hypervisor.log:

      [2022-09-03 15:25:35] (XEN) [ 1373.790117] memory_map: error -22 removing dom11 access to [3fff2100,3fff2103]
      [2022-09-03 15:25:35] (XEN) [ 1373.889785] memory_map: error -22 removing dom11 access to [3fff2100,3fff2103]
      [2022-09-03 15:25:35] (XEN) [ 1373.988658] memory_map: error -22 removing dom11 access to [3fff2100,3fff2103]
      [2022-09-03 15:25:35] (XEN) [ 1374.083144] memory_map: error -22 removing dom11 access to [3fff2100,3fff2103]
      [2022-09-03 15:25:35] (XEN) [ 1374.182721] memory_map: error -22 removing dom11 access to [3fff2100,3fff2103]
      [2022-09-03 15:25:41] (XEN) [ 1380.209079] d11v0 EPT violation 0x1aa (-w-/r-x) gpa 0x000000f184680c mfn 0x3fff2046 type 5
      [2022-09-03 15:25:41] (XEN) [ 1380.209083] d11v0 Walking EPT tables for GFN f1846:
      [2022-09-03 15:25:41] (XEN) [ 1380.209086] d11v0  epte 9c000015d1df8107
      [2022-09-03 15:25:41] (XEN) [ 1380.209089] d11v0  epte 9c00000e34a20107
      [2022-09-03 15:25:41] (XEN) [ 1380.209092] d11v0  epte 9c00000ef3120107
      [2022-09-03 15:25:41] (XEN) [ 1380.209094] d11v0  epte 9c5003fff2046945
      [2022-09-03 15:25:41] (XEN) [ 1380.209097] d11v0  --- GLA 0xffffb91f001a180c
      [2022-09-03 15:25:41] (XEN) [ 1380.209107] domain_crash called from vmx_vmexit_handler+0xf55/0x19c0
      [2022-09-03 15:25:41] (XEN) [ 1380.209110] Domain 11 (vcpu#0) crashed on cpu#39:
      [2022-09-03 15:25:41] (XEN) [ 1380.209116] ----[ Xen-4.13.4-9.24.1  x86_64  debug=n   Not tainted ]----
      [2022-09-03 15:25:41] (XEN) [ 1380.209119] CPU:    39
      [2022-09-03 15:25:41] (XEN) [ 1380.209122] RIP:    0010:[<ffffffff9ff8ccdd>]
      [2022-09-03 15:25:41] (XEN) [ 1380.209124] RFLAGS: 0000000000010246   CONTEXT: hvm guest (d11v0)
      [2022-09-03 15:25:41] (XEN) [ 1380.209129] rax: 0000000000000000   rbx: ffffb91f001a1800   rcx: 0000000000000080
      [2022-09-03 15:25:41] (XEN) [ 1380.209132] rdx: ffffb91f001a180c   rsi: 0000000000000001   rdi: 0000000000000000
      [2022-09-03 15:25:41] (XEN) [ 1380.209135] rbp: ffffb91f0059f968   rsp: ffffb91f0059f8f0   r8:  0000000000000000
      [2022-09-03 15:25:41] (XEN) [ 1380.209139] r9:  ffffb91f0059f7a8   r10: ffffb91f00000000   r11: ffffa059148c2f40
      [2022-09-03 15:25:41] (XEN) [ 1380.209142] r12: 0000000000000000   r13: ffffa05916b35000   r14: ffffa0590c723080
      [2022-09-03 15:25:41] (XEN) [ 1380.209145] r15: 000000000000000d   cr0: 0000000080050033   cr4: 00000000001606f0
      [2022-09-03 15:25:41] (XEN) [ 1380.209147] cr3: 00000001945b2001   cr2: 000055c9503250c0
      [2022-09-03 15:25:41] (XEN) [ 1380.209150] fsb: 00007feabbd8d880   gsb: ffffa05918400000   gss: 0000000000000000
      [2022-09-03 15:25:41] (XEN) [ 1380.209153] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010
      
      posted in Compute
      E
      exime