XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PCI Passthrough of Nvidia GPU and USB add-on card

    Scheduled Pinned Locked Moved Compute
    22 Posts 5 Posters 9.3k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      jevan223 @olivierlambert
      last edited by jevan223

      @olivierlambert Looking at the daemon.log, when the VM is started up there's a few lines pertaining to the video card:

      Mar 13 11:13:23 xcp-ng qemu-dm-5[9111]: 9111@1678724003.619415:xen_platform_log xen platform: xen|ModuleAdd: FFFFF8022F400000 - FFFFF8022F4E0FFF [dxgmms2.sys]
      Mar 13 11:13:23 xcp-ng qemu-dm-5[9111]: [00:0b.0] Write-back to unknown field 0x0c (partially) inhibited (0x00000000)
      Mar 13 11:13:23 xcp-ng qemu-dm-5[9111]: [00:0b.0] If the device doesn't work, try enabling permissive mode
      Mar 13 11:13:23 xcp-ng qemu-dm-5[9111]: [00:0b.0] (unsafe) and if it helps report the problem to xen-devel
      Mar 13 11:13:24 xcp-ng qemu-dm-5[9111]: 9111@1678724004.067025:xen_platform_log xen platform: xen|ModuleAdd: FFFFF8022F510000 - FFFFF8022F552FFF [hidclass.sys]
      

      I removed the devices from the vm using vm-param-remove, rebooted the machine, confirmed the devices were in the pci assignable list, then ran the following 4 commands:

      echo 0000:0b:00.0 > /sys/bus/pci/drivers/pciback/permissive
      echo 0000:0b:00.1 > /sys/bus/pci/drivers/pciback/permissive
      echo 0000:0b:00.2 > /sys/bus/pci/drivers/pciback/permissive
      echo 0000:0b:00.3 > /sys/bus/pci/drivers/pciback/permissive
      

      I re-assigned the PCI devices to the VM and re-booted, no change to the video out issue and the daemon.log still produces the same lines as above.

      I've also hit a few BSOD's stating Video TDR failure, with nvlddmkm.sys failing.

      Is this the proper way to set permissive mode?

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        @andSmv does it ring any bell?

        1 Reply Last reply Reply Quote 0
        • andSmvA Offline
          andSmv Vates 🪐 XCP-ng Team Xen Guru
          last edited by

          Hmm, at first glance looks to me as a real use case for q35 chipset emulation support on XEN ?

          1 Reply Last reply Reply Quote 0
          • andSmvA Offline
            andSmv Vates 🪐 XCP-ng Team Xen Guru
            last edited by

            @jevan223 can you please provide a lspci -vvv output (in dom0) ?

            J 1 Reply Last reply Reply Quote 0
            • J Offline
              jevan223 @andSmv
              last edited by

              @andSmv sure thing, I can't post the whole output as it's too long to post, but heres the output pertaining to the GPU and USB card

              0b:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1) (prog-if 00 [VGA controller])
                      Subsystem: Micro-Star International Co., Ltd. [MSI] Device c75a
                      Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
                      Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                      Latency: 0, Cache Line Size: 64 bytes
                      Interrupt: pin A routed to IRQ 54
                      Region 0: Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
                      Region 1: Memory at b0000000 (64-bit, prefetchable) [size=256M]
                      Region 3: Memory at c0000000 (64-bit, prefetchable) [size=32M]
                      Region 5: I/O ports at f000 [size=128]
                      [virtual] Expansion ROM at fc000000 [disabled] [size=512K]
                      Capabilities: [60] Power Management version 3
                              Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                      Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                              Address: 0000000000000000  Data: 0000
                      Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
                              DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                                      ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                              DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                      RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
                                      MaxPayload 128 bytes, MaxReadReq 512 bytes
                              DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                              LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
                                      ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                              LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                      ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                              LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                              DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
                              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                              LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                                       Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                       Compliance De-emphasis: -6dB
                              LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                                       EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
                      Capabilities: [100 v1] Virtual Channel
                              Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                              Arb:    Fixed- WRR32- WRR64- WRR128-
                              Ctrl:   ArbSelect=Fixed
                              Status: InProgress-
                              VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                                      Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                                      Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                                      Status: NegoPending- InProgress-
                      Capabilities: [250 v1] Latency Tolerance Reporting
                              Max snoop latency: 0ns
                              Max no snoop latency: 0ns
                      Capabilities: [258 v1] L1 PM Substates
                              L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                                        PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
                      Capabilities: [128 v1] Power Budgeting <?>
                      Capabilities: [420 v2] Advanced Error Reporting
                              UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                              CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                              CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                              AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
                      Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
                      Capabilities: [900 v1] #19
                      Capabilities: [bb0 v1] #15
                      Kernel driver in use: pciback
              
              0b:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
                      Subsystem: Micro-Star International Co., Ltd. [MSI] Device c75a
                      Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
                      Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                      Latency: 0, Cache Line Size: 64 bytes
                      Interrupt: pin B routed to IRQ 55
                      Region 0: Memory at fc080000 (32-bit, non-prefetchable) [size=16K]
                      Capabilities: [60] Power Management version 3
                              Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                      Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                              Address: 0000000000000000  Data: 0000
                      Capabilities: [78] Express (v2) Endpoint, MSI 00
                              DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                                      ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                              DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                      RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                                      MaxPayload 128 bytes, MaxReadReq 512 bytes
                              DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                              LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
                                      ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                              LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
                                      ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                              LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                              DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
                              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                              LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                                       EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                      Capabilities: [100 v2] Advanced Error Reporting
                              UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                              CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                              CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                              AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
                      Kernel driver in use: pciback
              
              0b:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1) (prog-if 30 [XHCI])
                      Subsystem: Micro-Star International Co., Ltd. [MSI] Device c75a
                      Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
                      Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                      Latency: 0, Cache Line Size: 64 bytes
                      Interrupt: pin C routed to IRQ 52
                      Region 0: Memory at c2000000 (64-bit, prefetchable) [size=256K]
                      Region 3: Memory at c2040000 (64-bit, prefetchable) [size=64K]
                      Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                              Address: 0000000000000000  Data: 0000
                      Capabilities: [78] Express (v2) Endpoint, MSI 00
                              DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                                      ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                              DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                      RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                                      MaxPayload 128 bytes, MaxReadReq 512 bytes
                              DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                              LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
                                      ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                              LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                      ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                              LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                              DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
                              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                              LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                                       EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                      Capabilities: [b4] Power Management version 3
                              Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                      Capabilities: [100 v2] Advanced Error Reporting
                              UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                              CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                              CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                              AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
                      Kernel driver in use: pciback
                      Kernel modules: xhci_pci
              
              0b:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)
                      Subsystem: Micro-Star International Co., Ltd. [MSI] Device c75a
                      Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
                      Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                      Latency: 0, Cache Line Size: 64 bytes
                      Interrupt: pin D routed to IRQ 53
                      Region 0: Memory at fc084000 (32-bit, non-prefetchable) [size=4K]
                      Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                              Address: 0000000000000000  Data: 0000
                      Capabilities: [78] Express (v2) Endpoint, MSI 00
                              DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                                      ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                              DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                      RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                                      MaxPayload 128 bytes, MaxReadReq 512 bytes
                              DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                              LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
                                      ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                              LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                      ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                              LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                              DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
                              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                              LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                                       EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                      Capabilities: [b4] Power Management version 3
                              Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                      Capabilities: [100 v2] Advanced Error Reporting
                              UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                              CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                              CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                              AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
                      Kernel driver in use: pciback
              
              06:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) (prog-if 30 [XHCI])
                      Subsystem: Samsung Electronics Co Ltd Device c0a5
                      Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                      Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                      Latency: 0, Cache Line Size: 64 bytes
                      Interrupt: pin A routed to IRQ 29
                      Region 0: Memory at fa700000 (64-bit, non-prefetchable) [size=8K]
                      Capabilities: [50] Power Management version 3
                              Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                      Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                              Address: 0000000000000000  Data: 0000
                      Capabilities: [90] MSI-X: Enable- Count=8 Masked-
                              Vector table: BAR=0 offset=00001000
                              PBA: BAR=0 offset=00001080
                      Capabilities: [a0] Express (v2) Endpoint, MSI 00
                              DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                                      ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                              DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                      RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                                      MaxPayload 128 bytes, MaxReadReq 512 bytes
                              DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                              LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
                                      ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                              LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                      ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                              LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                              DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
                              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                              LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                                       Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                       Compliance De-emphasis: -6dB
                              LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                                       EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                      Capabilities: [100 v1] Advanced Error Reporting
                              UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                              UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                              CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                              CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                              AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
                      Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
                      Capabilities: [150 v1] Latency Tolerance Reporting
                              Max snoop latency: 0ns
                              Max no snoop latency: 0ns
                      Kernel driver in use: pciback
                      Kernel modules: xhci_pci
              
              
              1 Reply Last reply Reply Quote 0
              • andSmvA Offline
                andSmv Vates 🪐 XCP-ng Team Xen Guru
                last edited by

                Yes. Some of the PCI capabilities are beyond the "standard" PCI configuration space of 256 bytes per BDF (PCI device). And unfortunatly the "enhanced" configuration access method is not provided yet (it's ongoing work) for HVM guests by XEN. It would require from QEMU (xen related part) the chipset emulation which offers an access to such method, such as Q35.

                Very probably, windows drivers for these devices are not happy to not access these fields, so this is potentially the reason of malfunctionning for these devices.

                The good way to confirm this would be to try to passthrough these devices to Linux guests, so we could possibly add some extended traces. And possibly passthrough these devices to PVH Linux guest and see how they are handled (PVH guest do not use QEMU for PCI bus emulation)

                J 1 Reply Last reply Reply Quote 1
                • J Offline
                  jevan223 @andSmv
                  last edited by jevan223

                  @andSmv Ok, so if the machine model isn't Q35 a Windows VM with a modern GPU passthrough wont be able to output video to a monitor?

                  I tried passing the devices through to an Ubuntu 22.04 VM, installed nvidia drivers but no luck, the USB devices work but the monitor just stays off. daemon.log snippit from the Ubuntu VM:

                  Mar 13 14:23:55 xcp-ng squeezed: [debug||4 ||xenops] watch /data/updated <- Mon Mar 13 15:23:55 2023
                  Mar 13 14:24:01 xcp-ng squeezed[1338]: [195.83] watch /data/updated <- 1
                  Mar 13 14:24:01 xcp-ng squeezed: [debug||4 ||xenops] watch /data/updated <- 1
                  Mar 13 14:24:01 xcp-ng ovs-ofctl: ovs|00001|ofp_util|WARN|Negative value -1 is not a valid port number.
                  Mar 13 14:24:01 xcp-ng ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 -- --if-exists del-port tap5.0
                  Mar 13 14:24:02 xcp-ng tapback[8597]: frontend.c:216 768 front-end supports persistent grants but we don't
                  Mar 13 14:24:02 xcp-ng tapdisk[8587]: received 'sring connect' message (uuid = 5)
                  Mar 13 14:24:02 xcp-ng tapdisk[8587]: connecting VBD 5 domid=5, devid=768, pool (null), evt 33, poll duration 1000, poll idle threshold 50
                  Mar 13 14:24:02 xcp-ng tapdisk[8587]: ring 0x1b38810 connected
                  Mar 13 14:24:02 xcp-ng tapdisk[8587]: sending 'sring connect rsp' message (uuid = 5)
                  Mar 13 14:24:03 xcp-ng qemu-dm-5[8842]: [00:08.0] Write-back to unknown field 0xc4 (partially) inhibited (0x00000000)
                  Mar 13 14:24:03 xcp-ng qemu-dm-5[8842]: [00:08.0] If the device doesn't work, try enabling permissive mode
                  Mar 13 14:24:03 xcp-ng qemu-dm-5[8842]: [00:08.0] (unsafe) and if it helps report the problem to xen-devel
                  Mar 13 14:24:03 xcp-ng qemu-dm-5[8842]: [00:07.0] Write-back to unknown field 0x44 (partially) inhibited (0x00)
                  Mar 13 14:24:03 xcp-ng qemu-dm-5[8842]: [00:07.0] If the device doesn't work, try enabling permissive mode
                  Mar 13 14:24:03 xcp-ng qemu-dm-5[8842]: [00:07.0] (unsafe) and if it helps report the problem to xen-devel
                  Mar 13 14:24:05 xcp-ng squeezed[1338]: [200.14] domid 5 just started a guest agent (but has no balloon driver); calibrating memory-offset = 0 KiB
                  Mar 13 14:24:05 xcp-ng squeezed: [debug||3 ||xenops] domid 5 just started a guest agent (but has no balloon driver); calibrating memory-offset = 0 KiB
                  Mar 13 14:24:05 xcp-ng squeezed[1338]: [200.14] watch /memory/memory-offset <- 0
                  Mar 13 14:24:05 xcp-ng squeezed: [debug||4 ||xenops] watch /memory/memory-offset <- 0
                  Mar 13 14:24:05 xcp-ng squeezed[1338]: [200.14] Xenctrl.domain_setmaxmem domid=5 max=6292480 (was=6347776)
                  Mar 13 14:24:05 xcp-ng squeezed: [debug||3 ||xenops] Xenctrl.domain_setmaxmem domid=5 max=6292480 (was=6347776)
                  Mar 13 14:24:11 xcp-ng squeezed[1338]: [206.39] watch /data/updated <- Mon Mar 13 14:24:11 2023
                  Mar 13 14:24:11 xcp-ng squeezed: [debug||4 ||xenops] watch /data/updated <- Mon Mar 13 14:24:11 2023
                  
                  andSmvA 1 Reply Last reply Reply Quote 0
                  • andSmvA Offline
                    andSmv Vates 🪐 XCP-ng Team Xen Guru @jevan223
                    last edited by andSmv

                    @jevan223 This is not about the real hardware. This is about the emulated chipset offered by QEMU to HVM guests (which is the case with Windows VM)

                    QEMU actually emulates 2 chipset to its guests

                    • i440fx: basic PCI bus with CAM access

                    • Q35: enhanced PCI bus with ECAM access (and thus access to PCI-e capabiliites).

                    The problem is that Q35 is not supported by xen-dependant parts in QEMU code, so only i440fx is emulated for XEN HVM guests. We are actually working to enable Q35 in XEN, but this is a work in progress.

                    Well, this is a hypothesis which needs to be confirmed, but by the look of a lspci output, there's a good chance that's the reason

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Maybe @andyhhp would confirm the hypothesis?

                      1 Reply Last reply Reply Quote 0
                      • J Offline
                        jevan223
                        last edited by

                        I've been running Unraid for a number of years, but wanted to migrate some VM's to XCP-ng to take advantage of backups, snapshots, and zfs. Looking back at my Unraid VM template settings that used this GPU for video passthrough, it was using i440fx and working great, although there were a few hoops to jump through to get it working:

                        The XML file of the VM needed to be edited to map the GPUs virtual video and audio devices to the same slot, and enable multifunction to get both video and audio working together.

                        Because its an Nvidia GPU, it also required a graphics ROM Bios to be passed through to the VM with the Nvidia header removed.

                        i440fx.JPG

                        Since the nvidia driver installed correctly on the XCP-ng VM without error code 43 I assume the card was passed to the VM correctly... That's about the extent of my knowledge, thought I'd share if it helps with the hypothesis

                        andSmvA 1 Reply Last reply Reply Quote 0
                        • andSmvA Offline
                          andSmv Vates 🪐 XCP-ng Team Xen Guru @jevan223
                          last edited by andSmv

                          @jevan223 Well, if you confirm it worked well on i440fx that probably the hypothesis is wrong. Whas it kvm-qemu virtualization?

                          J 1 Reply Last reply Reply Quote 0
                          • J Offline
                            jevan223 @andSmv
                            last edited by

                            @andSmv yes, unraid uses kvm, here's the xml file for reference:

                            <?xml version='1.0' encoding='UTF-8'?>
                            <domain type='kvm'>
                              <name>Win10</name>
                              <uuid>xxxxxxxxxxxxxx</uuid>
                              <description>1TB NVME</description>
                              <metadata>
                                <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
                              </metadata>
                              <memory unit='KiB'>12582912</memory>
                              <currentMemory unit='KiB'>12582912</currentMemory>
                              <memoryBacking>
                                <nosharepages/>
                              </memoryBacking>
                              <vcpu placement='static'>6</vcpu>
                              <cputune>
                                <vcpupin vcpu='0' cpuset='7'/>
                                <vcpupin vcpu='1' cpuset='19'/>
                                <vcpupin vcpu='2' cpuset='8'/>
                                <vcpupin vcpu='3' cpuset='20'/>
                                <vcpupin vcpu='4' cpuset='9'/>
                                <vcpupin vcpu='5' cpuset='21'/>
                              </cputune>
                              <os>
                                <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
                                <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi-tpm.fd</loader>
                                <nvram>/etc/libvirt/qemu/nvram/xxxxxuuidxxxxxx_VARS-pure-efi-tpm.fd</nvram>
                              </os>
                              <features>
                                <acpi/>
                                <apic/>
                                <hyperv mode='custom'>
                                  <relaxed state='on'/>
                                  <vapic state='on'/>
                                  <spinlocks state='on' retries='8191'/>
                                  <vendor_id state='on' value='none'/>
                                </hyperv>
                              </features>
                              <cpu mode='host-passthrough' check='none' migratable='on'>
                                <topology sockets='1' dies='1' cores='3' threads='2'/>
                                <cache mode='passthrough'/>
                                <feature policy='require' name='topoext'/>
                              </cpu>
                              <clock offset='localtime'>
                                <timer name='hypervclock' present='yes'/>
                                <timer name='hpet' present='no'/>
                              </clock>
                              <on_poweroff>destroy</on_poweroff>
                              <on_reboot>restart</on_reboot>
                              <on_crash>restart</on_crash>
                              <devices>
                                <emulator>/usr/local/sbin/qemu</emulator>
                                <disk type='file' device='disk'>
                                  <driver name='qemu' type='raw' cache='writeback'/>
                                  <source file='/mnt/user/domains/Windows 10/vdisk1.img'/>
                                  <target dev='hdc' bus='virtio'/>
                                  <boot order='1'/>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
                                </disk>
                                <disk type='file' device='cdrom'>
                                  <driver name='qemu' type='raw'/>
                                  <target dev='hdb' bus='ide'/>
                                  <readonly/>
                                  <address type='drive' controller='0' bus='0' target='0' unit='1'/>
                                </disk>
                                <controller type='usb' index='0' model='nec-xhci' ports='15'>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
                                </controller>
                                <controller type='virtio-serial' index='0'>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
                                </controller>
                                <controller type='pci' index='0' model='pci-root'/>
                                <controller type='ide' index='0'>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
                                </controller>
                                <interface type='bridge'>
                                  <mac address='xxxxxxxmacxxxxx'/>
                                  <source bridge='br0'/>
                                  <model type='virtio-net'/>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
                                </interface>
                                <serial type='pty'>
                                  <target type='isa-serial' port='0'>
                                    <model name='isa-serial'/>
                                  </target>
                                </serial>
                                <console type='pty'>
                                  <target type='serial' port='0'/>
                                </console>
                                <channel type='unix'>
                                  <target type='virtio' name='org.qemu.guest_agent.0'/>
                                  <address type='virtio-serial' controller='0' bus='0' port='1'/>
                                </channel>
                                <input type='tablet' bus='usb'>
                                  <address type='usb' bus='0' port='1'/>
                                </input>
                                <input type='mouse' bus='ps2'/>
                                <input type='keyboard' bus='ps2'/>
                                <tpm model='tpm-tis'>
                                  <backend type='emulator' version='2.0' persistent_state='yes'/>
                                </tpm>
                                <audio id='1' type='none'/>
                                <hostdev mode='subsystem' type='pci' managed='yes'>
                                  <driver name='vfio'/>
                                  <source>
                                    <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
                                  </source>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
                                </hostdev>
                                <hostdev mode='subsystem' type='pci' managed='yes'>
                                  <driver name='vfio'/>
                                  <source>
                                    <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
                                  </source>
                                  <rom file='/mnt/user/domains/Vbios/MSI.GTX1660Super.6144.191029.rom'/>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
                                </hostdev>
                                <hostdev mode='subsystem' type='pci' managed='yes'>
                                  <driver name='vfio'/>
                                  <source>
                                    <address domain='0x0000' bus='0x0c' slot='0x00' function='0x1'/>
                                  </source>
                                  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
                                </hostdev>
                                <memballoon model='none'/>
                              </devices>
                            </domain>
                            
                            1 Reply Last reply Reply Quote 0
                            • A Offline
                              andyhhp Xen Guru
                              last edited by

                              This is way way outside of a normal-ish looking server usecase. I'm honestly surprised you've got anything to function...

                              To start with, you're probably booting Xen with console=vga (because that's the default). It will be handed over to dom0 too, so start by going through the bootloader configuration and making sure that neither Xen nor dom0 are trying to use the display at all.

                              I suspect this is the root cause of the display going periodically back to black.

                              J 1 Reply Last reply Reply Quote 1
                              • J Offline
                                jevan223 @andyhhp
                                last edited by

                                @andyhhp I do have a 2nd GPU set as the primary output device in bios, and Xen uses it to display the console.. would Xen or dom0 try to use both GPU's?

                                J 1 Reply Last reply Reply Quote 0
                                • J Offline
                                  jmccoy555 @jevan223
                                  last edited by

                                  Hi @jevan223 , I tried to do something very similar with an AMD card and USB card a few years ago. Got the AMD card working but couldn't get the USB card to pass-through. How did you manage that?

                                  J 1 Reply Last reply Reply Quote 0
                                  • J Offline
                                    jevan223 @jmccoy555
                                    last edited by

                                    @jmccoy555 I followed the instructions here for passing through my USB PCIe card (2 port usb 3.0 Vantec card). You have to use a PCIe USB card, trying to passthrough an onboard usb controller just doesn't work.

                                    Did you do anything special to pass through your GPU and enable video out? Was performance close to bare metal?

                                    Sadly not getting this to work was a show stopper for me, I had to migrate to another server 'virtual environment' which did allow me to pass both video cards through to separate VM's, with TPM support for Windows 11 VM's as a bonus. That being said, I prefer XCP-ng and the XO interface, and would switch back if I could.

                                    J 1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      jmccoy555 @jevan223
                                      last edited by jmccoy555

                                      @jevan223 no nothing special for AMD cards. It just shows in XO as assignable. Used my WX4100 today but I was using a RX580 before.

                                      I am using a PCIe card but no joy. I've got NICs and SAS & SATA controllers also passed through so I know that is all working. So maybe is my card that doesn't want to play. Could you link your specific card please?

                                      I have passed through individual USB devices and that works nicely, but it's a bit of a faf to do.

                                      Performance seams to be great. Just waiting for some LAN KVM devices to play with now!

                                      J 1 Reply Last reply Reply Quote 0
                                      • J Offline
                                        jevan223 @jmccoy555
                                        last edited by

                                        @jmccoy555 The card I used was this one here, but it also worked with a no-name brand one I bought off newegg years ago.

                                        Thanks, I may have to try again with an AMD card down the road!

                                        J 1 Reply Last reply Reply Quote 0
                                        • J Offline
                                          jmccoy555 @jevan223
                                          last edited by jmccoy555

                                          @jevan223 Thanks. So looks like yours is Renesas uPD720201 and mine is ASMedia -ASM1142. Might try another card then. Though I don't really want a collection lying around gathering dust 😆 oh well.

                                          J 1 Reply Last reply Reply Quote 0
                                          • J Offline
                                            jmccoy555 @jmccoy555
                                            last edited by jmccoy555

                                            So no luck 😞

                                            lspci | grep USB
                                            00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
                                            00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
                                            00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
                                            04:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
                                            
                                            
                                            xl pci-assignable-list
                                            0000:00:11.4
                                            0000:04:00.0
                                            
                                            
                                            vm.start
                                            {
                                              "id": "da3eae3e-a67a-f0e2-f113-dd67f65baef1",
                                              "bypassMacAddressesCheck": false,
                                              "force": false
                                            }
                                            {
                                              "code": "INTERNAL_ERROR",
                                              "params": [
                                                "xenopsd internal error: Cannot_add(0000:04:00.0, Xenctrlext.Unix_error(30, \"1: Operation not permitted\"))"
                                              ],
                                              "call": {
                                                "method": "VM.start",
                                                "params": [
                                                  "OpaqueRef:63acb07f-292f-4062-820f-98ea4934b653",
                                                  false,
                                                  false
                                                ]
                                              },
                                              "message": "INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:04:00.0, Xenctrlext.Unix_error(30, \"1: Operation not permitted\")))",
                                              "name": "XapiError",
                                              "stack": "XapiError: INTERNAL_ERROR(xenopsd internal error: Cannot_add(0000:04:00.0, Xenctrlext.Unix_error(30, \"1: Operation not permitted\")))
                                                at Function.wrap (/home/node/xen-orchestra/packages/xen-api/src/_XapiError.js:16:12)
                                                at /home/node/xen-orchestra/packages/xen-api/src/transports/json-rpc.js:35:27
                                                at AsyncResource.runInAsyncScope (async_hooks.js:197:9)
                                                at cb (/home/node/xen-orchestra/node_modules/bluebird/js/release/util.js:355:42)
                                                at tryCatcher (/home/node/xen-orchestra/node_modules/bluebird/js/release/util.js:16:23)
                                                at Promise._settlePromiseFromHandler (/home/node/xen-orchestra/node_modules/bluebird/js/release/promise.js:547:31)
                                                at Promise._settlePromise (/home/node/xen-orchestra/node_modules/bluebird/js/release/promise.js:604:18)
                                                at Promise._settlePromise0 (/home/node/xen-orchestra/node_modules/bluebird/js/release/promise.js:649:10)
                                                at Promise._settlePromises (/home/node/xen-orchestra/node_modules/bluebird/js/release/promise.js:729:18)
                                                at _drainQueueStep (/home/node/xen-orchestra/node_modules/bluebird/js/release/async.js:93:12)
                                                at _drainQueue (/home/node/xen-orchestra/node_modules/bluebird/js/release/async.js:86:9)
                                                at Async._drainQueues (/home/node/xen-orchestra/node_modules/bluebird/js/release/async.js:102:5)
                                                at Immediate.Async.drainQueues [as _onImmediate] (/home/node/xen-orchestra/node_modules/bluebird/js/release/async.js:15:14)
                                                at processImmediate (internal/timers.js:464:21)
                                                at process.topLevelDomainCallback (domain.js:147:15)
                                                at process.callbackTrampoline (internal/async_hooks.js:129:24)"
                                            } 
                                            

                                            Any ideas anyone before I give up and send the card back......

                                            EDIT

                                            OK, computer says NO!!!! - This indicates that your device is using RMRR (opens new window). Intel IOMMU does not allow DMA to these devices (opens new window)and therefore PCI passthrough is not supported

                                            EDIT 2

                                            Could we patch the kernel too?

                                            https://github.com/kiler129/relax-intel-rmrr/blob/master/README.md#other-distros

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post