@TeddyAstie Thanks. Unfortunately my machine doesnt have IPMI. So can I just connect a serial cable between this machine and another machine and monitor the serial output on that other, say windows, machine running putty? Anything special to consider? I never did this before but happy to read up if you maybe have any pointers.
Posts
-
RE: XCP-NG server crashes/reboots unexpectedly
-
RE: XCP-NG server crashes/reboots unexpectedly
@nvs Machine crashed/restarted itself again this morning. I didn't even have all of the usual VMs running this time. Nothing was logged in kern.log when it crashed again. Before it crashed I checked a few times in the hours before xl dmesg but nothing obvious to me (same log as I posted above). Any suggestions highly welcome as I'm sure how to proceed with troubleshooting this. My next step would be replacing the PSU and see if anything changes, but its a long shot.
-
RE: XCP-NG server crashes/reboots unexpectedly
@TeddyAstie I've run
/opt/xensource/libexec/xen-cmdline --set-xen "dom0-iommu=strict"
and rebooted after.
Then I ranxl dmesg
(with almost no VMs running yet) and that results in the following output:(XEN) [000000176300f22c] Xen version 4.13.5-9.38 (mockbuild@[unknown]) (gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)) debug=n Wed Jan 31 16:13:42 CET 2024 (XEN) [000000176300faf0] Latest ChangeSet: 708e83f0e7d1, pq 491e2c4891d2 (XEN) [000000176301074a] build-id: **redacted to be sure** (XEN) [0000001763010c12] Bootloader: GRUB 2.02 (XEN) [0000001763011140] Command line: dom0_mem=7584M,max:7584M watchdog ucode=scan dom0_max_vcpus=1-16 crashkernel=256M,below=4G console=vga vga=mode-0x0311 dom0-iommu=strict (XEN) [00000017630118d2] Xen image load base address: 0xc3c00000 (XEN) [0000001763011c24] Video information: (XEN) [00000017630123b6] VGA is graphics mode 1920x1080, 32 bpp (XEN) [0000001763012708] Disc information: (XEN) [0000001763012a7c] Found 0 MBR signatures (XEN) [0000001763012e12] Found 6 EDD information structures (XEN) [00000017630166c8] EFI RAM map: (XEN) [0000001763016e16] 0000000000000000 - 00000000000a0000 (usable) (XEN) [0000001763017432] 00000000000a0000 - 0000000000100000 (reserved) (XEN) [0000001763017982] 0000000000100000 - 0000000009d1f000 (usable) (XEN) [0000001763017e06] 0000000009d1f000 - 000000000a000000 (reserved) (XEN) [0000001763018268] 000000000a000000 - 000000000a200000 (usable) (XEN) [00000017630186a8] 000000000a200000 - 000000000a20e000 (ACPI NVS) (XEN) [0000001763018b92] 000000000a20e000 - 00000000c3275000 (usable) (XEN) [000000176301907c] 00000000c3275000 - 00000000c3276000 (reserved) (XEN) [000000176301949a] 00000000c3276000 - 00000000c9748000 (usable) (XEN) [00000017630198b8] 00000000c9748000 - 00000000c9b4f000 (reserved) (XEN) [0000001763019d1a] 00000000c9b4f000 - 00000000c9cb3000 (ACPI data) (XEN) [000000176301a1e2] 00000000c9cb3000 - 00000000ca438000 (ACPI NVS) (XEN) [000000176301a666] 00000000ca438000 - 00000000cb9ff000 (reserved) (XEN) [000000176301aac8] 00000000cb9ff000 - 00000000cd000000 (usable) (XEN) [000000176301af4c] 00000000cd000000 - 00000000d0000000 (reserved) (XEN) [000000176301b3ae] 00000000f0000000 - 00000000f8000000 (reserved) (XEN) [000000176301b7cc] 00000000fd200000 - 00000000fd300000 (reserved) (XEN) [000000176301bc0c] 00000000fd400000 - 00000000fd600000 (reserved) (XEN) [000000176301c02a] 00000000fea00000 - 00000000fea10000 (reserved) (XEN) [000000176301c46a] 00000000feb80000 - 00000000fec02000 (reserved) (XEN) [000000176301c8aa] 00000000fec10000 - 00000000fec11000 (reserved) (XEN) [000000176301ccc8] 00000000fed00000 - 00000000fed01000 (reserved) (XEN) [000000176301d108] 00000000fed40000 - 00000000fed45000 (reserved) (XEN) [000000176301d56a] 00000000fed80000 - 00000000fed90000 (reserved) (XEN) [000000176301d9aa] 00000000fedc2000 - 00000000fedd0000 (reserved) (XEN) [000000176301ddea] 00000000fedd4000 - 00000000fedd6000 (reserved) (XEN) [000000176301e290] 00000000ff000000 - 0000000100000000 (reserved) (XEN) [000000176301e824] 0000000100000000 - 000000202f300000 (usable) (XEN) [000000176301ecec] 000000202f300000 - 0000002030000000 (reserved) (XEN) [00000017643db5c4] Kdump: 256MB (262144kB) at 0xb3200000 (XEN) [00000017643f3cc4] ACPI: RSDP C9CB2014, 0024 (r2 ALASKA) (XEN) [00000017643f4fa0] ACPI: XSDT C9CB1728, 00C4 (r1 ALASKA A M I 1072009 AMI 1000013) (XEN) [00000017643f62e2] ACPI: FACP C9CA2000, 0114 (r6 ALASKA A M I 1072009 AMI 10013) (XEN) [00000017643f7a86] ACPI: DSDT C9C93000, E26B (r2 ALASKA A M I 1072009 INTL 20120913) (XEN) [00000017643f858c] ACPI: FACS CA41B000, 0040 (XEN) [00000017643f8c96] ACPI: SSDT C9CA8000, 8CE9 (r2 AMD AmdTable 2 MSFT 4000000) (XEN) [00000017643f9802] ACPI: SSDT C9CA4000, 3B8E (r2 AMD AMD AOD 1 INTL 20120913) (XEN) [00000017643fa1d6] ACPI: SSDT C9CA3000, 01CC (r2 ALASKA CPUSSDT 1072009 AMI 1072009) (XEN) [00000017643fabee] ACPI: FIDT C9C92000, 009C (r1 ALASKA A M I 1072009 AMI 10013) (XEN) [00000017643fb5a0] ACPI: FPDT C9B97000, 0044 (r1 ALASKA A M I 1072009 AMI 1000013) (XEN) [00000017643fc4c4] ACPI: MCFG C9C90000, 003C (r1 ALASKA A M I 1072009 MSFT 10013) (XEN) [00000017643fcdee] ACPI: HPET C9C8F000, 0038 (r1 ALASKA A M I 1072009 AMI 5) (XEN) [00000017643fd6d4] ACPI: SSDT C9C8E000, 0024 (r1 AMD BIXBY 1000 INTL 20120913) (XEN) [00000017643fe042] ACPI: IVRS C9C8C000, 00D0 (r2 AMD AmdTable 1 AMD 1) (XEN) [00000017643fe9d2] ACPI: WPBT C9BAC000, 003C (r1 ALASKA A M I 1 ASUS 1) (XEN) [00000017643ff890] ACPI: PCCT C9BAB000, 006E (r2 AMD AmdTable 1 AMD 1) (XEN) [0000001764400154] ACPI: SSDT C9BA2000, 8033 (r2 AMD AmdTable 1 AMD 1) (XEN) [0000001764400a3a] ACPI: CRAT C9BA0000, 1710 (r1 AMD AmdTable 1 AMD 1) (XEN) [00000017644012dc] ACPI: CDIT C9B9F000, 0029 (r1 AMD AmdTable 1 AMD 1) (XEN) [0000001764401b7e] ACPI: BGRT C9C8B000, 0038 (r1 ALASKA A M I 1072009 AMI 10013) (XEN) [0000001764402486] ACPI: SSDT C9B9E000, 0259 (r2 AMD QOGIRDGP 1 INTL 20120913) (XEN) [0000001764402db0] ACPI: SSDT C9B9A000, 3E6E (r2 AMD QOGIRN 1 INTL 20120913) (XEN) [00000017644036b8] ACPI: WSMT C9B99000, 0028 (r1 ALASKA A M I 1072009 AMI 10013) (XEN) [0000001764404026] ACPI: APIC C9B98000, 0264 (r4 ALASKA A M I 1072009 AMI 10013) (XEN) [00000017644211ec] System RAM: 130972MB (134116324kB) (XEN) [000000176cc32890] No NUMA configuration found (XEN) [000000176cc32e68] Faking a node at 0000000000000000-000000202f300000 (XEN) [0000001832b1538e] Domain heap initialised (XEN) [0000001852a85ab4] vesafb: framebuffer at 0x00000000e1000000, mapped to 0xffff82c000201000, using 8128k, total 8128k (XEN) [0000001852a86a60] vesafb: mode is 1920x1080x32, linelength=7680, font 8x16 (XEN) [0000001852a8727a] vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0 (XEN) [0000001852a98ab6] CPU Vendor: AMD, Family 25 (0x19), Model 33 (0x21), Stepping 2 (raw 00a20f12) (XEN) [000000185b400954] SMBIOS 3.3 present. (XEN) [000000185b52baf6] x2APIC mode is already enabled by BIOS. (XEN) [000000185b6ac848] Using APIC driver x2apic_phys (XEN) [000000185b7ffd42] XSM Framework v1.0.1 initialized (XEN) [000000185b962afe] Initialising XSM SILO mode (XEN) [000000185bac3676] ACPI: PM-Timer IO Port: 0x808 (32 bits) (XEN) [000000185bc45c16] ACPI: v5 SLEEP INFO: control[0:0], status[0:0] (XEN) [000000185bdb4956] ACPI: SLEEP INFO: pm1x_cnt[1:804,1:0], pm1x_evt[1:800,1:0] (XEN) [000000185bf458fa] ACPI: 32/64X FACS address mismatch in FADT - ca41b000/0000000000000000, using 32 (XEN) [000000185c132cd4] ACPI: wakeup_vec[ca41b00c], vec_size[20] (XEN) [000000185c383276] ACPI: X2APIC (apic_id[0x00] uid[0x00] enabled) (XEN) [000000185c4e6e02] ACPI: X2APIC (apic_id[0x02] uid[0x02] enabled) (XEN) [000000185c64a9d2] ACPI: X2APIC (apic_id[0x04] uid[0x04] enabled) (XEN) [000000185c7ae2fa] ACPI: X2APIC (apic_id[0x06] uid[0x06] enabled) (XEN) [000000185c91144c] ACPI: X2APIC (apic_id[0x08] uid[0x08] enabled) (XEN) [000000185ca74714] ACPI: X2APIC (apic_id[0x0a] uid[0x0a] enabled) (XEN) [000000185cbd7976] ACPI: X2APIC (apic_id[0x0c] uid[0x0c] enabled) (XEN) [000000185cd3ab94] ACPI: X2APIC (apic_id[0x0e] uid[0x0e] enabled) (XEN) [000000185ce9de18] ACPI: X2APIC (apic_id[0x10] uid[0x10] enabled) (XEN) [000000185d001630] ACPI: X2APIC (apic_id[0x12] uid[0x12] enabled) (XEN) [000000185d164ed0] ACPI: X2APIC (apic_id[0x14] uid[0x14] enabled) (XEN) [000000185d2c8088] ACPI: X2APIC (apic_id[0x16] uid[0x16] enabled) (XEN) [000000185d42b196] ACPI: X2APIC (apic_id[0x18] uid[0x18] enabled) (XEN) [000000185d58e508] ACPI: X2APIC (apic_id[0x1a] uid[0x1a] enabled) (XEN) [000000185d6f1814] ACPI: X2APIC (apic_id[0x1c] uid[0x1c] enabled) (XEN) [000000185d854aba] ACPI: X2APIC (apic_id[0x1e] uid[0x1e] enabled) (XEN) [000000185d9b7e2c] ACPI: X2APIC (apic_id[0x01] uid[0x01] enabled) (XEN) [000000185db1c414] ACPI: X2APIC (apic_id[0x03] uid[0x03] enabled) (XEN) [000000185dc80732] ACPI: X2APIC (apic_id[0x05] uid[0x05] enabled) (XEN) [000000185dde4456] ACPI: X2APIC (apic_id[0x07] uid[0x07] enabled) (XEN) [000000185df47d18] ACPI: X2APIC (apic_id[0x09] uid[0x09] enabled) (XEN) [000000185e0abfae] ACPI: X2APIC (apic_id[0x0b] uid[0x0b] enabled) (XEN) [000000185e210992] ACPI: X2APIC (apic_id[0x0d] uid[0x0d] enabled) (XEN) [000000185e375134] ACPI: X2APIC (apic_id[0x0f] uid[0x0f] enabled) (XEN) [000000185e4d8ae4] ACPI: X2APIC (apic_id[0x11] uid[0x11] enabled) (XEN) [000000185e63c5a4] ACPI: X2APIC (apic_id[0x13] uid[0x13] enabled) (XEN) [000000185e7a0d68] ACPI: X2APIC (apic_id[0x15] uid[0x15] enabled) (XEN) [000000185e905570] ACPI: X2APIC (apic_id[0x17] uid[0x17] enabled) (XEN) [000000185ea698f4] ACPI: X2APIC (apic_id[0x19] uid[0x19] enabled) (XEN) [000000185ebcdd66] ACPI: X2APIC (apic_id[0x1b] uid[0x1b] enabled) (XEN) [000000185ed325b2] ACPI: X2APIC (apic_id[0x1d] uid[0x1d] enabled) (XEN) [000000185ee95c76] ACPI: X2APIC (apic_id[0x1f] uid[0x1f] enabled) (XEN) [000000185f06ca1e] ACPI: X2APIC_NMI (uid[0xffffffff] high edge lint[0x1]) (XEN) [000000185f228778] ACPI: IOAPIC (id[0x21] address[0xfec00000] gsi_base[0]) (XEN) [000000185f3ae700] IOAPIC[0]: apic_id 33, version 33, address 0xfec00000, GSI 0-23 (XEN) [000000185f557ed6] ACPI: IOAPIC (id[0x22] address[0xfec01000] gsi_base[24]) (XEN) [000000185f6e39e0] IOAPIC[1]: apic_id 34, version 33, address 0xfec01000, GSI 24-55 (XEN) [000000185f8cbe4e] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) [000000185fa55fb6] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) (XEN) [000000185fc279f6] ACPI: HPET id: 0x10228201 base: 0xfed00000 (XEN) [000000185fd83668] PCI: MCFG configuration 0: base f0000000 segment 0000 buses 00 - 7f (XEN) [000000185ff352a8] PCI: MCFG area at f0000000 reserved in E820 (XEN) [000000186008aeae] PCI: Using MCFG for segment 0000 bus 00-7f (XEN) [000000186021208a] ACPI: BGRT: invalidating v1 image at 0xc4573018 (XEN) [000000186037792c] Using ACPI (MADT) for SMP configuration information (XEN) [00000018604eba3a] SMP: Allowing 32 CPUs (0 hotplug CPUs) (XEN) [0000001860636bda] IRQ limits: 56 GSI, 6104 MSI/MSI-X (XEN) [0000001861047e04] AMD-Vi: IOMMU Extended Features: (XEN) [0000001861173cee] - Peripheral Page Service Request (XEN) [00000018612a12d2] - x2APIC (XEN) [000000186136f0f8] - NX bit (XEN) [000000186143ce74] - Invalidate All Command (XEN) [000000186154913a] - Guest APIC (XEN) [00000018616269fe] - Performance Counters (XEN) [000000186172af86] - Host Address Translation Size: 0x2 (XEN) [0000001861865e08] - Guest Address Translation Size: 0 (XEN) [0000001863de352c] - Guest CR3 Root Table Level: 0x1 (XEN) [0000001866341934] - Maximum PASID: 0xf (XEN) [0000001868863f42] - SMI Filter Register: 0x1 (XEN) [000000186ad8dbc8] - SMI Filter Register Count: 0x2 (XEN) [000000186d2c35d4] - Guest Virtual APIC Modes: 0x1 (XEN) [000000186f7fe8fe] - Dual PPR Log: 0x2 (XEN) [0000001871cf6464] - Dual Event Log: 0x2 (XEN) [00000018741c39a6] - User / Supervisor Page Protection (XEN) [000000187668e248] - Device Table Segmentation: 0x3 (XEN) [0000001878af206c] - PPR Log Overflow Early Warning (XEN) [000000187af12f90] - PPR Automatic Response (XEN) [000000187d2f5fec] - Memory Access Routing and Control: 0x1 (XEN) [000000187f6f5482] - Block StopMark Message (XEN) [0000001881ab0e46] - Performance Optimization (XEN) [0000001883e49d8c] - MSI Capability MMIO Access (XEN) [00000018861c3950] - Guest I/O Protection (XEN) [0000001888505256] - Host Access (XEN) [000000188a7f06e8] - Enhanced PPR Handling (XEN) [000000188caca66e] - Attribute Forward (XEN) [000000188ed6b07c] - Virtualized IOMMU (XEN) [0000001890fd688a] - VMGuard I/O Support (XEN) [0000001893216f2a] - VM Table Size: 0x2 (XEN) [0000001896c38926] AMD-Vi: IOMMU 0 Enabled. (XEN) [0000001898fa8502] xstate: size: 0x988 and states: 0x207 (XEN) [000000189b1c8bc6] CPU0: AMD Fam19h machine check reporting enabled (XEN) [000000189d3f7f14] Speculative mitigation facilities: (XEN) [000000189f5edaca] Hardware hints: STIBP_ALWAYS IBRS_FAST IBRS_SAME_MODE (XEN) [00000018a1833626] Hardware features: IBPB IBRS STIBP SSBD PSFD (XEN) [00000018a3a6723c] Compiled-in support: INDIRECT_THUNK SHADOW_PAGING (XEN) [00000018a5cb3310] Xen settings: BTI-Thunk RETPOLINE, SPEC_CTRL: IBRS- STIBP+ SSBD- PSFD-, Other: IBPB-ctxt BRANCH_HARDEN (XEN) [00000018a80132fa] Support for HVM VMs: MSR_SPEC_CTRL RSB IBPB-entry (XEN) [00000018aa311a58] Support for PV VMs: None (XEN) [00000018ac5b6888] XPTI (64-bit PV only): Dom0 disabled, DomU disabled (without PCID) (XEN) [00000018ae909128] PV L1TF shadowing: Dom0 disabled, DomU disabled (XEN) [00000018b0c38d4c] Using scheduler: SMP Credit Scheduler (credit) (XEN) [00000018bd1884c0] Platform timer is 14.318MHz HPET (XEN) [ 1.712732] Detected 3400.019 MHz processor. (XEN) [ 1.720226] EFI memory map: (XEN) [ 1.725112] 0000000000000-0000000007fff type=3 attr=000000000000000f (XEN) [ 1.730061] 0000000008000-000000000bfff type=2 attr=000000000000000f (XEN) [ 1.735018] 000000000c000-000000002dfff type=7 attr=000000000000000f (XEN) [ 1.739981] 000000002e000-000000003dfff type=2 attr=000000000000000f (XEN) [ 1.744946] 000000003e000-000000003efff type=4 attr=000000000000000f (XEN) [ 1.749918] 000000003f000-000000009efff type=3 attr=000000000000000f (XEN) [ 1.754887] 000000009f000-000000009ffff type=4 attr=000000000000000f (XEN) [ 1.759846] 0000000100000-0000000795fff type=2 attr=000000000000000f (XEN) [ 1.764800] 0000000796000-0000000ffffff type=7 attr=000000000000000f (XEN) [ 1.769743] 0000001000000-000000107ffff type=4 attr=000000000000000f (XEN) [ 1.774681] 0000001080000-00000023bdfff type=2 attr=000000000000000f (XEN) [ 1.779614] 00000023be000-0000009d1efff type=7 attr=000000000000000f (XEN) [ 1.784559] 0000009d1f000-0000009ffffff type=0 attr=000000000000000f (XEN) [ 1.789491] 000000a000000-000000a1fffff type=7 attr=000000000000000f (XEN) [ 1.794431] 000000a200000-000000a20dfff type=10 attr=000000000000000f (XEN) [ 1.799388] 000000a20e000-000008fcbffff type=7 attr=000000000000000f (XEN) [ 1.804350] 000008fcc0000-00000c0928fff type=2 attr=000000000000000f (XEN) [ 1.809314] 00000c0929000-00000c092efff type=7 attr=000000000000000f (XEN) [ 1.814295] 00000c092f000-00000c0a28fff type=1 attr=000000000000000f (XEN) [ 1.819298] 00000c0a29000-00000c0d1efff type=3 attr=000000000000000f (XEN) [ 1.824326] 00000c0d1f000-00000c0d1ffff type=2 attr=000000000000000f (XEN) [ 1.829376] 00000c0d20000-00000c1747fff type=3 attr=000000000000000f (XEN) [ 1.834477] 00000c1748000-00000c3274fff type=7 attr=000000000000000f (XEN) [ 1.839631] 00000c3275000-00000c3275fff type=0 attr=000000000000000f (XEN) [ 1.844818] 00000c3276000-00000c3dfffff type=7 attr=000000000000000f (XEN) [ 1.850053] 00000c3e00000-00000c41defff type=2 attr=000000000000000f (XEN) [ 1.855323] 00000c41df000-00000c4244fff type=7 attr=000000000000000f (XEN) [ 1.860613] 00000c4245000-00000c42c5fff type=4 attr=000000000000000f (XEN) [ 1.865923] 00000c42c6000-00000c42cbfff type=7 attr=000000000000000f (XEN) [ 1.871256] 00000c42cc000-00000c4328fff type=4 attr=000000000000000f (XEN) [ 1.876633] 00000c4329000-00000c432dfff type=7 attr=000000000000000f (XEN) [ 1.882035] 00000c432e000-00000c4349fff type=4 attr=000000000000000f (XEN) [ 1.887462] 00000c434a000-00000c434afff type=7 attr=000000000000000f (XEN) [ 1.892914] 00000c434b000-00000c4365fff type=4 attr=000000000000000f (XEN) [ 1.898405] 00000c4366000-00000c4366fff type=7 attr=000000000000000f (XEN) [ 1.903934] 00000c4367000-00000c4561fff type=4 attr=000000000000000f (XEN) [ 1.909483] 00000c4562000-00000c4562fff type=7 attr=000000000000000f (XEN) [ 1.915056] 00000c4563000-00000c4973fff type=4 attr=000000000000000f (XEN) [ 1.920653] 00000c4974000-00000c4977fff type=7 attr=000000000000000f (XEN) [ 1.926283] 00000c4978000-00000c4b89fff type=4 attr=000000000000000f (XEN) [ 1.931929] 00000c4b8a000-00000c4b8bfff type=7 attr=000000000000000f (XEN) [ 1.937608] 00000c4b8c000-00000c9747fff type=4 attr=000000000000000f (XEN) [ 1.943318] 00000c9748000-00000c9b4efff type=0 attr=000000000000000f (XEN) [ 1.949062] 00000c9b4f000-00000c9cb2fff type=9 attr=000000000000000f (XEN) [ 1.954837] 00000c9cb3000-00000ca437fff type=10 attr=000000000000000f (XEN) [ 1.960665] 00000ca438000-00000cb951fff type=6 attr=800000000000000f (XEN) [ 1.966533] 00000cb952000-00000cb9fefff type=5 attr=800000000000000f (XEN) [ 1.972455] 00000cb9ff000-00000cbdfffff type=4 attr=000000000000000f (XEN) [ 1.978391] 00000cbe00000-00000cbfc6fff type=7 attr=000000000000000f (XEN) [ 1.984363] 00000cbfc7000-00000cc0c6fff type=4 attr=000000000000000f (XEN) [ 1.990372] 00000cc0c7000-00000cc0fffff type=3 attr=000000000000000f (XEN) [ 1.996416] 00000cc100000-00000cc148fff type=4 attr=000000000000000f (XEN) [ 2.002477] 00000cc149000-00000cc160fff type=3 attr=000000000000000f (XEN) [ 2.008544] 00000cc161000-00000cc17afff type=4 attr=000000000000000f (XEN) [ 2.014643] 00000cc17b000-00000cc181fff type=3 attr=000000000000000f (XEN) [ 2.020730] 00000cc182000-00000cc191fff type=4 attr=000000000000000f (XEN) [ 2.026825] 00000cc192000-00000cc19efff type=3 attr=000000000000000f (XEN) [ 2.032927] 00000cc19f000-00000ccf95fff type=4 attr=000000000000000f (XEN) [ 2.038962] 00000ccf96000-00000ccf98fff type=3 attr=000000000000000f (XEN) [ 2.044999] 00000ccf99000-00000ccfabfff type=4 attr=000000000000000f (XEN) [ 2.051069] 00000ccfac000-00000ccfadfff type=3 attr=000000000000000f (XEN) [ 2.057121] 00000ccfae000-00000ccfbefff type=4 attr=000000000000000f (XEN) [ 2.063176] 00000ccfbf000-00000ccfc2fff type=3 attr=000000000000000f (XEN) [ 2.069238] 00000ccfc3000-00000ccfd6fff type=4 attr=000000000000000f (XEN) [ 2.075324] 00000ccfd7000-00000ccfdafff type=3 attr=000000000000000f (XEN) [ 2.081440] 00000ccfdb000-00000ccfeefff type=4 attr=000000000000000f (XEN) [ 2.087603] 00000ccfef000-00000ccffffff type=3 attr=000000000000000f (XEN) [ 2.093767] 0000100000000-000202f2fffff type=7 attr=000000000000000f (XEN) [ 2.099930] 00000000a0000-00000000fffff type=0 attr=000000000000000f (XEN) [ 2.107447] 00000cd000000-00000cfffffff type=0 attr=000000000000000f (XEN) [ 2.113615] 00000f0000000-00000f7ffffff type=11 attr=800000000000100d (XEN) [ 2.119782] 00000fd200000-00000fd2fffff type=11 attr=8000000000000001 (XEN) [ 2.125953] 00000fd400000-00000fd5fffff type=11 attr=8000000000000001 (XEN) [ 2.132127] 00000fea00000-00000fea0ffff type=11 attr=8000000000000001 (XEN) [ 2.138297] 00000feb80000-00000fec01fff type=11 attr=8000000000000001 (XEN) [ 2.144470] 00000fec10000-00000fec10fff type=11 attr=8000000000000001 (XEN) [ 2.150642] 00000fed00000-00000fed00fff type=11 attr=8000000000000001 (XEN) [ 2.156826] 00000fed40000-00000fed44fff type=11 attr=8000000000000001 (XEN) [ 2.163000] 00000fed80000-00000fed8ffff type=11 attr=8000000000000001 (XEN) [ 2.169176] 00000fedc2000-00000fedcffff type=11 attr=8000000000000001 (XEN) [ 2.175353] 00000fedd4000-00000fedd5fff type=11 attr=8000000000000001 (XEN) [ 2.181535] 00000ff000000-00000ffffffff type=11 attr=8000000000000001 (XEN) [ 2.187752] 000202f300000-000202fffffff type=0 attr=000000000000000f (XEN) [ 2.193935] alt table ffff82d080456fd8 -> ffff82d080465f0c (XEN) [ 2.215796] I/O virtualisation enabled (XEN) [ 2.221891] - Dom0 mode: Strict (XEN) [ 2.227932] Interrupt remapping enabled (XEN) [ 2.234562] ENABLING IO-APIC IRQs (XEN) [ 2.240529] -> Using new ACK method (XEN) [ 2.246661] ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) [ 2.978010] Defaulting to alternative key handling; send 'A' to switch to normal mode. (XEN) [ 2.983954] Allocated console ring of 128 KiB. (XEN) [ 2.989864] HVM: ASIDs enabled. (XEN) [ 2.995703] SVM: Supported advanced features: (XEN) [ 3.001551] - Nested Page Tables (NPT) (XEN) [ 3.007365] - Last Branch Record (LBR) Virtualisation (XEN) [ 3.013188] - Next-RIP Saved on #VMEXIT (XEN) [ 3.018964] - VMCB Clean Bits (XEN) [ 3.024681] - DecodeAssists (XEN) [ 3.030349] - Virtual VMLOAD/VMSAVE (XEN) [ 3.035988] - Virtual GIF (XEN) [ 3.041565] - Pause-Intercept Filter (XEN) [ 3.047120] - Pause-Intercept Filter Threshold (XEN) [ 3.052662] - TSC Rate MSR (XEN) [ 3.058130] - MSR_SPEC_CTRL virtualisation (XEN) [ 3.063589] HVM: SVM enabled (XEN) [ 3.068973] HVM: Hardware Assisted Paging (HAP) detected (XEN) [ 3.074378] HVM: HAP page sizes: 4kB, 2MB, 1GB (XEN) [ 3.079924] alt table ffff82d080456fd8 -> ffff82d080465f0c (XEN) [ 3.089890] Brought up 32 CPUs (XEN) [ 3.096521] Testing NMI watchdog on all CPUs: ok (XEN) [ 3.119654] Scheduling granularity: cpu, 1 CPU per sched-resource (XEN) [ 3.146498] mcheck_poll: Machine check polling timer started. (XEN) [ 3.151998] xenoprof: Initialization failed. AMD processor family 25 is not supported (XEN) [ 3.157567] Dom0 has maximum 1208 PIRQs (XEN) [ 3.163090] csched_alloc_domdata: setting dom 0 as the privileged domain (XEN) [ 3.168663] NX (Execute Disable) protection active (XEN) [ 3.174188] *** Building a PV Dom0 *** (XEN) [ 3.321828] Xen kernel: 64-bit, lsb, compat32 (XEN) [ 3.327303] Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x302c000 (XEN) [ 3.333051] PHYSICAL MEMORY ARRANGEMENT: (XEN) [ 3.338477] Dom0 alloc.: 0000001fcc000000->0000001fd0000000 (1920194 pages to be allocated) (XEN) [ 3.344218] Init. ramdisk: 000000202dec2000->000000202f1ff04f (XEN) [ 3.349783] VIRTUAL MEMORY ARRANGEMENT: (XEN) [ 3.355183] Loaded kernel: ffffffff81000000->ffffffff8302c000 (XEN) [ 3.360621] Init. ramdisk: 0000000000000000->0000000000000000 (XEN) [ 3.366036] Phys-Mach map: 0000008000000000->0000008000ed0000 (XEN) [ 3.371460] Start info: ffffffff8302c000->ffffffff8302c4b8 (XEN) [ 3.376864] Xenstore ring: 0000000000000000->0000000000000000 (XEN) [ 3.382264] Console ring: 0000000000000000->0000000000000000 (XEN) [ 3.387639] Page tables: ffffffff8302d000->ffffffff8304a000 (XEN) [ 3.393028] Boot stack: ffffffff8304a000->ffffffff8304b000 (XEN) [ 3.398388] TOTAL: ffffffff80000000->ffffffff83400000 (XEN) [ 3.403767] ENTRY ADDRESS: ffffffff8242b180 (XEN) [ 3.410120] Dom0 has maximum 16 VCPUs (XEN) [ 4.355937] Initial low memory virq threshold set at 0x4000 pages. (XEN) [ 4.361096] Scrubbing Free RAM in background (XEN) [ 4.366204] Std. Loglevel: Errors, warnings and info (XEN) [ 4.371356] Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) [ 4.376612] Xen is relinquishing VGA console. (XEN) [ 4.398905] *** Serial input to DOM0 (type 'CTRL-a' three times to switch input) (XEN) [ 4.399014] Freed 608kB init memory
Anything interesting in this log that could be a hint? I guess the next step is to launch all my normal VMs again and try and catch any error messages just before the machine crashes? I couldn't figure out still how to constantly monitor in realtime any new messages for the xl dmesg command. My idea was to try that and keep a SSH terminal open such that when the machine crashes/reboots unexpectedly we can see any last error messages that might have popped up. Any pointers on next steps/what else to check would be much appreciated. Thanks.
-
RE: XCP-NG server crashes/reboots unexpectedly
@TeddyAstie Thanks. I've run that command. Do you want me to reboot after? Do I need to set this command after each reboot again or does it stick? I can post the output here or is there anything particular I should be looking out for?
Side question, any chance to tail -f xl dmesg to see real time output? That would allow me to see any last messages before it crashes potentially.
-
RE: XCP-NG server crashes/reboots unexpectedly
It's been a while but these reboots continue to happen. Last week I had two in just two days time. I ran Memtest86+ on that machine now as you suggested @olivierlambert, but that passed without errors. CPU temp during this test seems to have hit 82deg C. Not sure thats something to potentially worry about in normal server operations that could result in a reboot?
Any other ideas what this could be? Is pretty tricky to troubleshoot without any error logs.
-
RE: XCP-NG server crashes/reboots unexpectedly
Regarding memory test: Just running the normal mem test from Grub should do, I guess?
-
RE: XCP-NG server crashes/reboots unexpectedly
@stormi Thanks. I've gone through the kern.log.1/2/3 etc and I can see when the server seems to have rebooted and comes back up again, but there doesnt seem to be anything logged just before it quits.
-
RE: XCP-NG server crashes/reboots unexpectedly
@olivierlambert I looked through the xensource.log.1/2/3 etc files.
What sticks out to me is that there is a gap here:
xensource.log.2's last four lines: Nov 19 16:43:27 xcp-ng xapi: [debug||3064510 /var/lib/xcp/xapi||dummytaskhelper] task dispatch:SR.scan D:*** created by task D:*** Nov 19 16:43:27 xcp-ng xapi: [ info||3064512 /var/lib/xcp/xapi||taskhelper] task SR.scan R:*** (uuid:***) created (trackid=***) by task D:*** Nov 19 16:43:27 xcp-ng xapi: [debug||3064512 /var/lib/xcp/xapi|SR.scan R:***|message_forwarding] SR.scan: SR = '*** (20TB HDD)' Nov 19 16:51:30 xcp-ng xapi: [debug||3069218 /var/lib/xcp/xapi|session.slave_local_login_with_password D:***|xapi_session] Add session to local storage
Then the next four lines are in a new xensource.log.1 file, but notice a 49 minute gap until then:
xensource.log.1's first four lines: Nov 19 17:40:08 xcp-ng xenopsd-xc: [debug||5 ||xenops_server] Received an event on managed VM *** Nov 19 17:40:08 xcp-ng xcp-rrdd: [ info||9 ||rrdd_main] memfree has changed to 4191660 in domain 9 Nov 19 17:40:08 xcp-ng xenopsd-xc: [debug||5 |queue|xenops_server] Queue.push ["VM_check_state","***"] onto ***:[ ] Nov 19 17:40:08 xcp-ng xenopsd-xc: [debug||40 ||xenops_server] Queue.pop returned ["VM_check_state","***"]
I've redacted some UUID's with ***, probably wasn't needed but just in case.
From the earlier graphs I expected to not see any log here (assuming the machine was off or whatever), but it seems it was actually running most of the time. The above 49min gap seems to be the only longer gap I can spot at first sight in the last days log. Strange because in XOA the graph shows as if the host was down for like 23h or so. Any thoughts?
-
RE: XCP-NG server crashes/reboots unexpectedly
@olivierlambert Hi, do you have some pointers which exact files to check for those?
I've looked at:
- /var/log/xensource.log, but that log seems to have started earlier today. I don't see entries back to when the reboots happened.
- Regarding IPMI logs: This machine is a Ryzen 9 5950X on a Asus Prime X570 Pro motherboard. It doesn't have IPMI unfortunately.
I will look into doing a memtest as you suggested.
-
RE: XCP-NG server crashes/reboots unexpectedly
I took a look at the performance graphs in XOA and the two reboots can clearly be seen. What looks interesting to me is that the server seems to have stayed offline for quite a while (when there are no data points in the graph) ? And only then came back up. Also after the 2nd reboot there seems to be a high load average, even though only one VM is on auto-start (xen orchestra) and no other VMs were started yet.
Anyone who can make something of this? It seems weird to me that whatever induces a reboot of the system would not bring it up directly again, but in fact have varying durations until XCP-NG+XOA is back up, according to the graphs. Based on this, it seems after 1st reboot it was down for ~3h. After 2nd reboot it was down for about ~23h.
Also note:
- This server was running stable in this exact configuration for almost 2 years now.
- I have two other pretty much identical servers that do not have this issue (same rack, same power source)
-
XCP-NG server crashes/reboots unexpectedly
Hi,
In the last 1.5 weeks my server seems to have rebooted itself at least two times. I noticed this because my VMs weren't running anymore. It seemed like a fresh reboot of the server. I want to figure out what the reason is and fix the issue. I started looking at logs i.e. at /var/log/kern.log and the /var/crash folder, but both file and folder are completely empty.
My conclusion on the above kern.log and crash folder being empty is that it probably wasn't XCP-NG crashing causing the reboot.
My conclusion would be that its likely power (PSU or motherboard) related. Any other logs I should check/or any other comments you may have (on my conclusions above) ?
Thanks!
-
RE: Questions about backup features
@CJ said in Questions about backup features:
I have one schedule that runs daily to perform a delta backup and then a second schedule with force full backup enabled that only runs once a week. Make sure you remove the weekly full backup day from the nightly schedule or you'll have both backups happening.
@CJ just curious: why would you do a full backup once a week? In principle your remote delta backup will already stay "up-to-date" with just the delta backups right? Or am I missing something?
-
RE: PCIe USB card (and PCIe bridge) disappear after host reboot
Yeah.. this definitely was a nightmare, I am taking a few days off after this
-
RE: PCIe USB card (and PCIe bridge) disappear after host reboot
After another full day of troubleshooting it looks like I found the issue..
Installed Ubuntu Server and tested the plugged in USB cards that were detected to figure out which one was the one dropping out. Turns out if that card is in any of the PCIe slots it will cause the issues seen. If its not installed in the server no cards disappear.
I've removed an identical and known working PCIe USB card from my 2nd machine and replaced the faulty one. It seems everything is working fine again. Quite interesting how a faulty card resulted in this rollercoaster of symptoms seen.. at least some nice lessons learned for the future
-
RE: PCIe USB card (and PCIe bridge) disappear after host reboot
Tried some more things but nothing resolved the issue:
-
Put RAM speed from DDR4-3200 to AUTO -> Same issue
-
Put a different GPU (removed the Nvidia K2200 GPU) but still breaks when i.e. starting with 0 plugged in SATA devices to plugging in 1st SATA HDD.. -> Same issue
-
Reseated CPU and checked for any bent pins (looked all OK) and re-pasted it -> Same issue
-
Tried using different output on K2200 GPU (output 2 (DP) instead of usually output 3 (DP)) -> Same issue
-
Tried without any GPU at all (also not onboard GPU, as this CPU doesnt have integrated graphics) -> Same issue
-
Took out PCIe USB cards one by one (had no GPU installed at all while testing that, had 10gig card in top PCIe slot for a change, and 1x HDD attached via SATA). Then removed one by one the PCIe USB cards:
^Every time I remove one and boot, it shows the correct amount of PCIe USB cards first time. Then after reboot always one PCIe USB card-1 less.. That amount then also seems to stay across reboots. However, when only one PCIe USB card is left, that card seems to stay recognized and does not disappear after a reboot! -
Reset bios settings (still using latest BIOS version) by removing battery and shorting RTC reset pins. Left bios at untouched defaults and booted into XCP-NG -> Same issue
-
Removed all RAM modules and installed just one RAM stick -> Same issue
-
Downgraded BIOS to version 4408 and left at BIOS defaults -> Same issue
It looks like the system likes eating the PCIe USB cards. I will try ASUS customer support tomorrow but I am not expecting much from that..
Could this be an IRQ conflict? What still baffles me is how the issue isnt resolved if the machine is shut off for say 30 secs, but is after it was off for 10 minutes. It would then usually boot up with all cards recognized again.. In the back of my mind I am imagining some hardware failure that depends on something capacitively charged that could explain such time-delay behaviour.. Any thoughts/other ideas?
-
-
RE: PCIe USB card (and PCIe bridge) disappear after host reboot
Okay, story continues:
- Had DVD SATA not plugged into the mainboard.
- System booted fine and PCIe devices got listed fine across reboots.
- Plugged DVD drive in to verify it breaks things (as explained above).
- Disconnected DVD SATA from mainboard again, and the PCIe card indeed stayed unlisted across reboots still (as explained before).
- I shut down the machine and left it without power for 21 minutes
- I power up the machine but it doesnt boot. Black screen remains. Error LED on mainboard for "CPU" is lit up indicating some boot issue with the CPU.. Interesting.
- I cut the power and start the machine again. It starts up in "safe mode" and forces me into bios by pressing F1. I exit the bios with no changes.
- Machine restarts and then continues to boot fine again into XCP as usual. All PCIe cards are detected normally again, also across reboots.
And, another find: Its not specific to the DVD drive. Going from 0 plugged in SATA devices to plugging in one HDD the same issue occurs. So it seems more like if a system device is added/changed it causes the issue..
Curious if anyone here can make anything from this? I am happy to replace the motherboard if that fixes the issue, but can we be sure what component is really faulty here? (motherboard/cpu/ram/pcie card?)
-
RE: PCIe USB card (and PCIe bridge) disappear after host reboot
Spent some more time troubleshooting. And made some interesting discoveries!
First I tried setting various things in the bios, like:
- Advanced->AMD CBS->Global c-state control: Was set to "auto", tried: "enabled" and "disabled"
- Advanced->AMD CBS->CPU common options->Local APIC mode: Was "x2APIC", tried: "compatibility", "xAPIC", "auto"
These changes didnt help anything though, the issue remained.
Also installed a new XCP NG on the 2nd M.2 in that server so I could test with the "normal" XCP NG against a fresh install. -> Same issue
Set power supply from "multi rail" to "single rail" -> Same issueToday I made the discovery. Plugging the DVD drive into the SATA port on the mainboard, that triggers one of the PCIe USB cards to not be recognized anymore, as explained in the original post above. Interestingly, if unplugging the DVD SATA from the mainboard after it caused the issue, that PCIe USB card still remains unlisted. If I leave the server like 1h powered down and then start it up (with DVD drive not connected to any SATA port) it appears the system will keep recognizing all PCIe cards. Very interesting..
So working config in this server is as such:
- 6 PCIe cards (as mentioned in first post above)
- 2x M.2 SSDs (1x 2 TB and 1x 4 TB) on the motherboard
- 4x SATA devices attached (3x HDDs 18TB 18TB and 20TB, 1x SSD 500GB)
If I plug in the DVD drive (which would be the 5th SATA device) it breaks things.
Any more ideas what may be going on/be behind this?
-
RE: PCIe USB card (and PCIe bridge) disappear after host reboot
@olivierlambert Hi, yes im running the latest BIOS version 5003 (released just 2023/10/31).
Any suggestions what kind of bios settings to look at in particular? I have a pretty much identical system (as far as motherboard+cpu+ram+ssds+pcie USB cards+pcie NIC go) and that one has not shown this behaviour so far. Bios version and settings in both systems I checked and should be pretty much identical I think. -
PCIe USB card (and PCIe bridge) disappear after host reboot
Hi,
I have an Asus X570-Pro with a Ryzen 9 5950X CPU. After a reboot of the machine one of my PCIe devices (always one of the USB PCIe cards) is no longer detected. Once gone, it stays gone across reboots. To fix it, it seems I usually need to remove a card, power up the server, power it down again and plug the card back in. It still will be gone again after the next reboot after that again.
I have the following cards installed in its 6 PCIe slots:
- 4 port USB card (running x1)
- 7 port USB card (running x1)
- 10Gbit network card (should be running x8)
- 7 port USB card (running x1)
- 7 port USB card (running x1)
- Nvidia Quadro K2200 GPU (should be running x8)
This is the lspci output when all cards are detected correctly:
[23:10 localhost ~]# lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5013 (rev 01) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream 03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 04:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5013 (rev 01) 05:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 06:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 07:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 08:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 09:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP 0a:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 0a:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 0b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) 0c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) 0d:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2) 0d:00.1 Audio device: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] (rev a1) 0e:00.0 Ethernet controller: Intel Corporation 82599 10 Gigabit Network Connection (rev 01) 10:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function 11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP 11:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP 11:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 11:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
And this is the lspci output after the reboot when one of the PCIe USB cards disappears.
[23:17 localhost ~]# lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5013 (rev 01) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream 03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge 04:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5013 (rev 01) 05:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 06:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 08:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP 09:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 0a:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) 0b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) 0c:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2) 0c:00.1 Audio device: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] (rev a1) 0d:00.0 Ethernet controller: Intel Corporation 82599 10 Gigabit Network Connection (rev 01) 0f:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function 10:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP 10:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP 10:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller 10:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
What can be seen is that in the broken situation the following two devices are missing (so not just the PCIe USB card but also a device called "PCIe GPP Bridge"):
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
07:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)I am no expert, but maybe this is a useful clue if that PCIe bridge disappears at the same time as the USB PCIe card.. I've spent the whole day troubleshooting all kinds of different slot combinations and removing/adding/reseating cards, but unfortunately to no avail.
Any clues/help would be much appreciated!
-
RE: PCI Passthrough with both GPU and USB
Hi,
I stumbled across exactly the same issue that GPU and USB pcie cards would crash the VM if passed through together. It was already mentioned in an earlier reply that updating would fix the issue, and I just wanted to confirm that works for me as well. After running the following commands on my xcp-ng 8.2:yum update yum upgrade
everything works nicely now! Thanks!