NVIDIA GPU passthrough on XCP-ng 8.3 fails after reboot — UUID/PCI ID changes
-
I’m trying to use passthrough for an NVIDIA GPU on XCP-ng 8.3.
The host detects the GPU (lspci shows VGA + Audio), and IOMMU is enabled.However, whenever I apply xen-pciback.hide and reboot the host, XCP-ng generates a new internal UUID and new PCI ID for the GPU.
As a result, I cannot even assign the GPU to a VM, because the device “changes” its reference after each reboot.Additional important context:
I previously had a different GPU doing passthrough in a VM.
That GPU was removed.
The new GPU was installed in a different slot.
I suspect the issues may be related to leftover metadata from the previous GPU, which prevents the new GPU from being recognized consistently for passthrough.
Question:
Has anyone encountered this problem? Is there a way to completely clear the old passthrough state on the host or ensure that the new GPU is recognized consistently for passthrough? -
Hi,
Clear entirely the boot parameter with
/opt/xensource/libexec/xen-cmdline --delete-dom0 xen-pciback.hide, reboot, and re-assign. -
Thank you for your reply.
I followed your instructions; however, when I passthrough the GPU NVIDIA on the XenOrchestra and he same error occurs when I reboot.
Change the PCI and UUID of the graphics card to the one before rebooting.
-
This post is deleted! -
Can you tell us on what platform you are doing this? It looks like a buggy PCI reset.
-
I have a the XCP8.3 installed.
Xen Orchestra, commit bcee5 .When I add it manually, the audio, for example, it asks for a reboot, but then, when I restart the server, it has another pci ID, and another ID.

-
Oo I'm not sure to follow. But let me add @Team-Hypervisor-Kernel in case that rings a bell.
-
Hi,
@samuelolavo can you run the following command on the dom0 and paste its output please:
grep -A5 "'XCP-ng'" /etc/grub.cfgThen also run
lspci -vvv xe pci-list xl pci-assignable-list -
@yannsionneau Hi,
Sorry for the delay...menuentry 'XCP-ng' { search --label --set root root-zrxcsq multiboot2 /boot/xen.gz dom0_mem=8192M,max:8192M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 module2 /boot/vmlinuz-4.19-xen root=LABEL=root-zrxcsq ro nolvm hpet=disable console=hvc0 console=tty0 quiet vga=785 splash plymouth.ignore-serial-consoles xen-pciback.hide=(0000:03:00.0) module2 /boot/initrd-4.19-xen.img }03:00.0 VGA compatible controller: NVIDIA Corporation GB202GL [RTX PRO 6000 Blackwell Max-Q Workstation Edition] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device 204c Physical Slot: 10 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 89 Region 0: Memory at f4000000 (32-bit, non-prefetchable) [disabled] [size=64M] Region 1: Memory at 70060000000 (64-bit, prefetchable) [disabled] [size=256M] Region 3: Memory at 70070000000 (64-bit, prefetchable) [disabled] [size=32M] Region 5: I/O ports at 1000 [disabled] [size=128] Expansion ROM at f8000000 [disabled] [size=512K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable- Count=1/16 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x16, ASPM L1, Exit Latency L0s unlimited, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed unknown, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Via WAKE# LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [9c] Vendor Specific Information: Len=14 <?> Capabilities: [100 v1] #19 Capabilities: [12c v1] Latency Tolerance Reporting Max snoop latency: 1048576ns Max no snoop latency: 1048576ns Capabilities: [134 v1] #15 Capabilities: [14c v1] #25 Capabilities: [158 v1] #26 Capabilities: [188 v1] #2a Capabilities: [1b8 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [200 v1] #27 Capabilities: [248 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [2a4 v1] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?> Capabilities: [2bc v1] Power Budgeting <?> Capabilities: [2f4 v1] Device Serial Number 18-a6-fe-7f-8f-2d-b0-48 Kernel driver in use: pciback 03:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1) Subsystem: NVIDIA Corporation Device 0000 Physical Slot: 10 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 10 Region 0: Memory at f8080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x16, ASPM L1, Exit Latency L0s unlimited, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed unknown, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [9c] Vendor Specific Information: Len=14 <?> Capabilities: [100 v1] #25 Capabilities: [10c v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [154 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Genoa/Bergamo Dummy Function (rev 01) Subsystem: Advanced Micro Devices, Inc. [AMD] Genoa/Bergamo Dummy Function Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [64] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed unknown, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270 v1] #19 Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [410 v1] #26 Capabilities: [450 v1] #27 Capabilities: [500 v1] #2axe pci-list uuid ( RO) : 73708288-55ec-b17f-ba73-6d2c116b3bbc vendor-name ( RO): NVIDIA Corporation device-name ( RO): GB202GL [RTX PRO 6000 Blackwell Max-Q Workstation Edition] pci-id ( RO): 0000:03:00.0 uuid ( RO) : c94f0327-8c86-3aa8-dd7c-9389ae1123f5 vendor-name ( RO): Intel Corporation device-name ( RO): Ethernet Controller X550 pci-id ( RO): 0000:81:00.1 uuid ( RO) : 09e0f3b1-18bb-8a6e-97d8-0209a8e4a97c vendor-name ( RO): Advanced Micro Devices, Inc. [AMD] device-name ( RO): FCH SATA Controller [AHCI mode] pci-id ( RO): 0000:0a:00.1 uuid ( RO) : cc5feb5c-b8aa-a975-de76-513d309f8e73 vendor-name ( RO): Intel Corporation device-name ( RO): Ethernet Controller X550 pci-id ( RO): 0000:41:00.0 uuid ( RO) : 72baa4c2-13b3-22fb-cc87-b10033ccb025 vendor-name ( RO): Broadcom / LSI device-name ( RO): MegaRAID 12GSAS/PCIe Secure SAS39xx pci-id ( RO): 0000:c1:00.0 uuid ( RO) : d2d40d25-4b69-7f6a-a8e2-3101ad80fcb6 vendor-name ( RO): Intel Corporation device-name ( RO): Ethernet Controller X550 pci-id ( RO): 0000:41:00.1 uuid ( RO) : 630bdaee-0e03-b8a1-c726-4f34230e89f7 vendor-name ( RO): Intel Corporation device-name ( RO): Ethernet Controller X550 pci-id ( RO): 0000:81:00.0 uuid ( RO) : d4023077-83fe-a0b7-5f3f-516204c2c1d1 vendor-name ( RO): Advanced Micro Devices, Inc. [AMD] device-name ( RO): FCH SATA Controller [AHCI mode] pci-id ( RO): 0000:ce:00.1 uuid ( RO) : c67855cd-4908-0711-7424-a6db2eb011f0 vendor-name ( RO): Advanced Micro Devices, Inc. [AMD] device-name ( RO): FCH SATA Controller [AHCI mode] pci-id ( RO): 0000:0a:00.0 uuid ( RO) : ef1d93e4-e82b-c33c-a488-8f7a9129eb8a vendor-name ( RO): NVIDIA Corporation device-name ( RO): Device 22e8 pci-id ( RO): 0000:03:00.1 uuid ( RO) : b6235c56-4070-dc4d-9db2-5e361f38d2b2 vendor-name ( RO): Advanced Micro Devices, Inc. [AMD] device-name ( RO): FCH SATA Controller [AHCI mode] pci-id ( RO): 0000:ce:00.0 uuid ( RO) : 5c7258ae-504b-9ac4-8b16-7129b8d8455d vendor-name ( RO): ASPEED Technology, Inc. device-name ( RO): ASPEED Graphics Family pci-id ( RO): 0000:cc:00.0xl pci-assignable-list 0000:03:00.0Model: Supermicro AS-2015CS-TNR
-
@samuelolavo Thanks for your answer
It's very weird because by seeing the command outputs that you pasted, it looks like everything is behaving as it should be.
Even the PCI ID (segment:bus:device:function) seems to stay correct (0000:03:00.0)I'll ask others internally.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login