Hailo-8L AI accellerator PCI passthrough causes xcp-ng hypervisor infinite boot-loop
- 
 Hello, this is my first post on this forum, so I want to thank your for your work on xcp-ng. Failed PCI passthrough attempt: 
 In my case I have problems with passing through PCI device. When I follow guide from page https://docs.xcp-ng.org/compute/ just after hiding pci device and rebooting server, hypervisor can;t boot and sticks in infinite boot loop. I had to boot it into safe mode and remove pci hide option. Then everything went back to normal.Success PCI passthrough: 
 There is another possibility to pass through PCI device without rebooting hypervisor. This method is described on XEN page: https://wiki.xenproject.org/wiki/Xen_PCI_Passthrough. It is called Dynamic assignment with xl.
 So when I follow xen docummentation I was able passthrough my device into VM and I can confirm that everything is working correctly. I successfully connected AI coprocessor with firgate VM.It would be great to fix pci passthrough with hiding pci device from Dom0. In this case I will be able to configure my VM to autostart after server reset. My xcp-ng version is 8.3 with all patches applied as for time of writing this post. 
 My server is HP DL380 gen 9
- 
 Hello and welcome here! That's weird than just hiding the device from the Dom0 is causing an issue  Do you have any logs during the crash we can check? Do you have any logs during the crash we can check?
- 
 No, but I can recreate issue and collect such logs. Where I can find this logs? What can I tell is that this issues was present also on xcp-ng 8.2. I thought that upgrading to 8.3 may fix this issue. 
- 
 First, let's collect the exact commands you are using to hide it from the Dom0, in case there's a typo  
- 
 It wasn't my first time doing this. Previously I successfully passedthrough FibreChannel HBA to VM. 
 But I understand your point. This is output form history command. I copied only interesting part:18 lspci | grep hailo 19 lspci 20 /opt/xensource/libexec/xen-cmdline --set-dom0 "xen-pciback.hide=(0000:08:00.0)" 21 /opt/xensource/libexec/xen-cmdline --get-dom0 xen-pciback.hide 22 rebootand this is output from lspci -vn 08:00.0 0b40: 1e60:2864 (rev 01) Subsystem: 1e60:2864 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at 39ff0604000 (64-bit, prefetchable) [size=16K] Memory at 39ff0608000 (64-bit, prefetchable) [size=4K] Memory at 39ff0600000 (64-bit, prefetchable) [size=16K] Capabilities: [80] Express Endpoint, MSI 00 Capabilities: [e0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [f8] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?> Capabilities: [108] Latency Tolerance Reporting Capabilities: [110] L1 PM Substates Capabilities: [128] Alternative Routing-ID Interpretation (ARI) Capabilities: [200] Advanced Error Reporting Capabilities: [300] #19 Kernel driver in use: pciback Kernel modules: hailo_pciAs you can see there is hailo_pci kernel module (currently not used). But during my first attempts it was not present, so boot loop was caused without this driver. I only compiled it later during my debugging process. 
- 
 Hmm could the module causing the crash if the device isn't accessible?  @TeddyAstie any opinion? 
- 
 @olivierlambert said in Hailo-8L AI accellerator PCI passthrough causes xcp-ng hypervisor infinite boot-loop: Hmm could the module causing the crash if the device isn't accessible?  A quick google found this thread on a proxmox forum - according to that user it causes a hotplug event when it initializes. 
 https://forum.proxmox.com/threads/hailo-8-ai-m-2-card-crashes-server-when-using-passthrough.166428/Seems these AI cards are a bit of a pain - remembering the continuing issue with Google Coral pcie cards 
- 
 I've seen cases where the a hard reset is forced in case some devices can't DMA. Maybe it's related. 
 If that's the case, something should show up in the IPMI, and the crash is usually instantaneous; otherwise, there is some delay (~5 seconds) between Xen/Dom0 crash and actual reboot.
