Possible kernel bug or memory errors?
-
Posting this here in case anyone has seen a kernel bug report like this. Not sure if this is caused by bad memory on the host or a kernel bug, since the error: "BUG: unable to handle page fault for address: ffff8bed005acef0" hints of a kernel bug.
I have 2 VM's one on each XCP-ng host (8.2.1), so I am swapping the VM's onto the opposing hosts to see if the issue moves with the VM. Both VM's are running Ubuntu 22.04 LTS, and only one VM gets this dmesg output at random times. Heck, it may not even be an issue, but I noticed it a couple of times now on the VM console output.
Both of these VM's are on shared NFS storage, and configured identical. Since the issue only crops up on one VM it makes me think I may have a hardware issue on one of the hosts.
Here is the full output of dmesg:
[43221.422334] audit: type=1400 audit(1681126980.459:66): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31" pid=19675 comm="apparmor_parser" [43221.551436] audit: type=1400 audit(1681126980.591:67): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31" pid=19680 comm="apparmor_parser" [43221.664895] audit: type=1400 audit(1681126980.703:68): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31-rootfs" pid=19684 comm="apparmor_parser" [43223.430838] audit: type=1400 audit(1681126982.467:69): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31-rootfs" pid=19706 comm="apparmor_parser" [129621.800451] audit: type=1400 audit(1681213381.944:70): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852" pid=49675 comm="apparmor_parser" [129621.874920] audit: type=1400 audit(1681213382.020:71): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852" pid=49680 comm="apparmor_parser" [129621.952373] audit: type=1400 audit(1681213382.096:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852-rootfs" pid=49684 comm="apparmor_parser" [129623.813366] audit: type=1400 audit(1681213383.956:73): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852-rootfs" pid=49700 comm="apparmor_parser" [129623.965681] BUG: unable to handle page fault for address: ffff8bed005acef0 [129623.975064] #PF: supervisor write access in kernel mode [129623.981634] #PF: error_code(0x0002) - not-present page [129623.987448] PGD 0 P4D 0 [129623.990986] Oops: 0002 [#1] SMP PTI [129623.995061] CPU: 1 PID: 49707 Comm: btrfs Tainted: P O 5.15.0-69-generic #76-Ubuntu [129624.004045] Hardware name: Xen HVM domU, BIOS 4.13 03/21/2023 [129624.010586] RIP: 0010:dentry_unlink_inode+0x50/0x130 [129624.016233] Code: ff ff 8f fe 89 17 a9 00 00 08 00 74 08 65 48 ff 05 4d ee 87 76 49 8b 84 24 b8 00 00 00 48 85 c0 74 2c 49 8b 94 24 b0 00 00 00 <48> 89 10 48 85 d2 74 04 48 89 42 08 49 c7 84 24 b0 00 00 00 00 00 [129624.034534] RSP: 0018:ffff9dbd421e7c10 EFLAGS: 00010286 [129624.040161] RAX: ffff8bed005acef0 RBX: ffff8fec0a668c00 RCX: 0000000000000000 [129624.047475] RDX: 0000000000000000 RSI: ffff8fed442d6100 RDI: ffff8fec0a668e40 [129624.054969] RBP: ffff9dbd421e7c20 R08: 0000000000000001 R09: 0000000000000000 [129624.062487] R10: ffff8fec097f4410 R11: ffff8febc6832000 R12: ffff8fec0a668e40 [129624.070518] R13: ffff8fed005acdb8 R14: ffff8fec0a668e40 R15: ffff8febc6832440 [129624.077917] FS: 00007f8898c208c0(0000) GS:ffff8fed44a40000(0000) knlGS:0000000000000000 [129624.086420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [129624.093075] CR2: ffff8bed005acef0 CR3: 0000000048122004 CR4: 00000000003706e0 [129624.100365] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [129624.108491] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [129624.116430] Call Trace: [129624.119822] <TASK> [129624.122903] __dentry_kill+0xeb/0x190 [129624.127399] shrink_dentry_list+0x86/0x150 [129624.132202] shrink_dcache_parent+0xcc/0x120 [129624.137232] d_invalidate+0x6f/0xf0 [129624.141512] btrfs_delete_subvolume+0x281/0x510 [btrfs] [129624.147381] btrfs_ioctl_snap_destroy+0x615/0x730 [btrfs] [129624.153444] btrfs_ioctl+0x13f/0x1160 [btrfs] [129624.158776] ? handle_mm_fault+0xd8/0x2c0 [129624.163809] __x64_sys_ioctl+0x92/0xd0 [129624.168552] do_syscall_64+0x59/0xc0 [129624.173235] ? irqentry_exit+0x1d/0x30 [129624.177930] ? exc_page_fault+0x89/0x170 [129624.183007] entry_SYSCALL_64_after_hwframe+0x61/0xcb [129624.189050] RIP: 0033:0x7f8898d373ab [129624.193621] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48 [129624.213015] RSP: 002b:00007ffdd1bb59e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [129624.221570] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f8898d373ab [129624.229797] RDX: 00007ffdd1bb5a20 RSI: 000000005000940f RDI: 0000000000000003 [129624.237972] RBP: 0000000000000003 R08: 0000564feae05364 R09: 0000000000000095 [129624.246112] R10: 0000564feab1efbf R11: 0000000000000246 R12: 0000000000000000 [129624.254349] R13: 0000564feae05364 R14: 0000000000000003 R15: 00007ffdd1bb88c5 [129624.262563] </TASK> [129624.265884] Modules linked in: tls veth unix_diag nft_masq nft_chain_nat zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nf_tables nfnetlink vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xenfs xen_privcmd bridge stp llc binfmt_misc nls_iso8859_1 ppdev joydev input_leds serio_raw parport_pc parport sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel usbhid aesni_intel crypto_simd hid cryptd psmouse floppy [129624.346933] CR2: ffff8bed005acef0 [129624.351431] ---[ end trace dd21c84404389223 ]--- [129624.356996] RIP: 0010:dentry_unlink_inode+0x50/0x130 [129624.362912] Code: ff ff 8f fe 89 17 a9 00 00 08 00 74 08 65 48 ff 05 4d ee 87 76 49 8b 84 24 b8 00 00 00 48 85 c0 74 2c 49 8b 94 24 b0 00 00 00 <48> 89 10 48 85 d2 74 04 48 89 42 08 49 c7 84 24 b0 00 00 00 00 00 [129624.381901] RSP: 0018:ffff9dbd421e7c10 EFLAGS: 00010286 [129624.387982] RAX: ffff8bed005acef0 RBX: ffff8fec0a668c00 RCX: 0000000000000000 [129624.396069] RDX: 0000000000000000 RSI: ffff8fed442d6100 RDI: ffff8fec0a668e40 [129624.404315] RBP: ffff9dbd421e7c20 R08: 0000000000000001 R09: 0000000000000000 [129624.412508] R10: ffff8fec097f4410 R11: ffff8febc6832000 R12: ffff8fec0a668e40 [129624.420699] R13: ffff8fed005acdb8 R14: ffff8fec0a668e40 R15: ffff8febc6832440 [129624.428879] FS: 00007f8898c208c0(0000) GS:ffff8fed44a40000(0000) knlGS:0000000000000000 [129624.437965] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [129624.444887] CR2: ffff8bed005acef0 CR3: 0000000048122004 CR4: 00000000003706e0 [129624.453708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [129624.462942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
-
Do you have anything in Dom0
dmesg
andxl dmesg
? -
Some of the ugly dmesg output from host with the troubled VM:
[242392.838917] vif vif-7-0 vif7.0: Guest Rx ready [242421.920556] block tdd: sector-size: 512/512 capacity: 251658240 [242427.246697] block tde: sector-size: 512/512 capacity: 251658240 [242428.379056] block tdf: sector-size: 512/512 capacity: 251658240 [243384.664175] block tde: sector-size: 512/512 capacity: 251658240 [243385.760070] block tdf: sector-size: 512/512 capacity: 251658240 [244310.448680] block tde: sector-size: 512/512 capacity: 251658240 [244311.556701] block tdf: sector-size: 512/512 capacity: 251658240 [244312.702292] block tdg: sector-size: 512/512 capacity: 251658240 [244321.783435] device vif8.0 entered promiscuous mode [244336.272665] block tda: sector-size: 512/512 capacity: 178257920 [244337.035240] device vif7.0 left promiscuous mode [244337.251070] vif vif-8-0 vif8.0: Guest Rx ready [244362.015783] block tdc: sector-size: 512/512 capacity: 178257920 [244367.775702] block tde: sector-size: 512/512 capacity: 178257920 [244368.875340] block tdf: sector-size: 512/512 capacity: 178257920 [244984.216622] block tde: sector-size: 512/512 capacity: 178257920 [244985.375864] block tdf: sector-size: 512/512 capacity: 178257920 [245602.874372] block tde: sector-size: 512/512 capacity: 178257920 [245604.076988] block tdf: sector-size: 512/512 capacity: 178257920 [245605.204176] block tdg: sector-size: 512/512 capacity: 178257920 [245627.262457] device vif9.0 entered promiscuous mode [245640.879281] block tda: sector-size: 512/512 capacity: 251658240 [245641.649783] device vif8.0 left promiscuous mode [245641.898617] vif vif-9-0 vif9.0: Guest Rx ready [246215.439267] block tdd: sector-size: 512/512 capacity: 178257920 [246215.479136] block tde: sector-size: 512/512 capacity: 251658240 [247387.052767] print_req_error: I/O error, dev tde, sector 75145216 [247387.052820] print_req_error: I/O error, dev tde, sector 75145304 [247387.052865] print_req_error: I/O error, dev tde, sector 75145392 [247387.052930] print_req_error: I/O error, dev tde, sector 75145480 [247387.053039] print_req_error: I/O error, dev tde, sector 75145568 [247387.053087] print_req_error: I/O error, dev tde, sector 75145656 [247387.053137] print_req_error: I/O error, dev tde, sector 75145744 [247387.053183] print_req_error: I/O error, dev tde, sector 75145832 [247387.053233] print_req_error: I/O error, dev tde, sector 75145920 [247387.053282] print_req_error: I/O error, dev tde, sector 75146008 [248010.875068] block tdd: sector-size: 512/512 capacity: 251658240 [248010.913350] block tde: sector-size: 512/512 capacity: 178257920 [249059.521257] print_req_error: 86 callbacks suppressed [249059.521258] print_req_error: I/O error, dev tde, sector 74633216 [249059.521309] print_req_error: I/O error, dev tde, sector 74633304 [249059.521354] print_req_error: I/O error, dev tde, sector 74633392 [249059.521399] print_req_error: I/O error, dev tde, sector 74633480 [249059.521443] print_req_error: I/O error, dev tde, sector 74633568 [249059.521647] print_req_error: I/O error, dev tde, sector 74633656 [249059.521713] print_req_error: I/O error, dev tde, sector 74633744 [249059.521737] print_req_error: I/O error, dev tde, sector 74633832 [249059.521737] print_req_error: I/O error, dev tde, sector 74633920 [249059.521737] print_req_error: I/O error, dev tde, sector 74634008 [249811.475374] block tdd: sector-size: 512/512 capacity: 251658240 [249811.519873] block tde: sector-size: 512/512 capacity: 178257920 [250899.872276] print_req_error: 86 callbacks suppressed [250899.872277] print_req_error: I/O error, dev tde, sector 74674176 [250899.872329] print_req_error: I/O error, dev tde, sector 74674264 [250899.872379] print_req_error: I/O error, dev tde, sector 74674352 [250899.872424] print_req_error: I/O error, dev tde, sector 74674440 [250899.872472] print_req_error: I/O error, dev tde, sector 74674528 [250899.872516] print_req_error: I/O error, dev tde, sector 74674616 [250899.872560] print_req_error: I/O error, dev tde, sector 74674704 [250899.872603] print_req_error: I/O error, dev tde, sector 74674792 [250899.872646] print_req_error: I/O error, dev tde, sector 74674880 [250899.872689] print_req_error: I/O error, dev tde, sector 74674968 [251616.965255] block tdd: sector-size: 512/512 capacity: 251658240 [251617.009656] block tde: sector-size: 512/512 capacity: 178257920
Output of xl dmesg on same host:
\ \/ /___ _ __ | || | / |___ / | ___| \ // _ \ '_ \ | || |_ | | |_ \ |___ \ / \ __/ | | | |__ _|| |___) | ___) | /_/\_\___|_| |_| |_|(_)_|____(_)____/ (XEN) [000000299f9da931] Xen version 4.13.5-9.30 (mockbuild@[unknown]) (gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)) debug=n Tue Mar 21 13:25:35 CET 2023 (XEN) [000000299f9db8ab] Latest ChangeSet: 708e83f0e7d1, pq c81a2963b08d (XEN) [000000299f9dcc95] build-id: f5d6abf4ba12c46c496f6d910114c239298148bf (XEN) [000000299f9dd548] Bootloader: GRUB 2.02 (XEN) [000000299f9dde9e] Command line: dom0_mem=1440M,max:1440M watchdog ucode=scan dom0_max_vcpus=1-4 crashkernel=256M,below=4G console=vga vga=mode-0x0311 (XEN) [000000299f9decb1] Xen image load base address: 0xda800000 (XEN) [000000299f9df44e] Video information: (XEN) [000000299f9dff64] VGA is graphics mode 640x480, 16 bpp (XEN) [000000299f9e0a94] VBE/DDC methods: V2; EDID transfer time: 1 seconds (XEN) [000000299f9e150b] Disc information: (XEN) [000000299f9e1cc6] Found 1 MBR signatures (XEN) [000000299f9e2493] Found 2 EDD information structures (XEN) [000000299f9eb3b6] Xen-e820 RAM map: (XEN) [000000299f9ec0b1] 0000000000000000 - 000000000009dc00 (usable) (XEN) [000000299f9ecdde] 000000000009dc00 - 00000000000a0000 (reserved) (XEN) [000000299f9edb15] 00000000000e0000 - 0000000000100000 (reserved) (XEN) [000000299f9ee817] 0000000000100000 - 00000000d4c60000 (usable) (XEN) [000000299f9ef4b1] 00000000d4c60000 - 00000000d4c61000 (ACPI NVS) (XEN) [000000299f9f010e] 00000000d4c61000 - 00000000d4c62000 (reserved) (XEN) [000000299f9f0c4e] 00000000d4c62000 - 00000000daf48000 (usable) (XEN) [000000299f9f174e] 00000000daf48000 - 00000000dc1ba000 (reserved) (XEN) [000000299f9f2236] 00000000dc1ba000 - 00000000dc244000 (ACPI data) (XEN) [000000299f9f2e85] 00000000dc244000 - 00000000dca1e000 (ACPI NVS) (XEN) [000000299f9f3981] 00000000dca1e000 - 00000000dcf44000 (reserved) (XEN) [000000299f9f4439] 00000000dcf44000 - 00000000dd000000 (usable) (XEN) [000000299f9f4f02] 00000000dd000000 - 00000000e0000000 (reserved) (XEN) [000000299f9f59e8] 00000000f8000000 - 00000000fc000000 (reserved) (XEN) [000000299f9f6483] 00000000fe000000 - 00000000fe011000 (reserved) (XEN) [000000299f9f6fcc] 00000000fec00000 - 00000000fec01000 (reserved) (XEN) [000000299f9f7b07] 00000000fed00000 - 00000000fed01000 (reserved) (XEN) [000000299f9f85bc] 00000000fee00000 - 00000000fee01000 (reserved) (XEN) [000000299f9f9174] 00000000ff000000 - 0000000100000000 (reserved) (XEN) [000000299f9f9d92] 0000000100000000 - 000000081e000000 (usable) (XEN) [00000029a1ebb587] Kdump: 256MB (262144kB) at 0xc4c00000 (XEN) [00000029a1f2bb4c] ACPI: RSDP 000F05B0, 0024 (r2 LENOVO) (XEN) [00000029a1f2d902] ACPI: XSDT DC1D50B8, 00EC (r1 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f2fc14] ACPI: FACP DC203650, 0114 (r6 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f31df4] ACPI: DSDT DC1D5230, 2E41B (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f334e7] ACPI: FACS DC9EDC40, 0040 (XEN) [00000029a1f343bf] ACPI: APIC DC203768, 00BC (r3 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f356df] ACPI: FPDT DC203828, 0044 (r1 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f36965] ACPI: MCFG DC203870, 003C (r1 LENOVO TC-M1A 1300 MSFT 97) (XEN) [00000029a1f37bbe] ACPI: SSDT DC2038B0, 03BC (r1 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f38fda] ACPI: FIDT DC203C70, 009C (r1 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f3a224] ACPI: SLIC DC203D10, 0176 (r1 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f3b537] ACPI: MSDM DC203E88, 0055 (r3 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f3c74b] ACPI: SSDT DC203EE0, 3159 (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f3db45] ACPI: SSDT DC207040, 26E8 (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f3f1bc] ACPI: HPET DC209728, 0038 (r1 LENOVO TC-M1A 1300 MSFT 5F) (XEN) [00000029a1f403ab] ACPI: SSDT DC209760, 0C59 (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f417fb] ACPI: UEFI DC20A3C0, 0042 (r1 LENOVO TC-M1A 1300 1000013) (XEN) [00000029a1f42b40] ACPI: SSDT DC20A408, 0EDE (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f43f15] ACPI: LPIT DC20B2E8, 0094 (r1 LENOVO TC-M1A 1300 MSFT 5F) (XEN) [00000029a1f45106] ACPI: WSMT DC20B380, 0028 (r1 LENOVO TC-M1A 1300 MSFT 5F) (XEN) [00000029a1f462cb] ACPI: SSDT DC20B3A8, 029F (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f4762a] ACPI: SSDT DC20B648, 3002 (r2 LENOVO TC-M1A 1300 INTL 20160422) (XEN) [00000029a1f489ca] ACPI: DBGP DC20E650, 0034 (r1 LENOVO TC-M1A 1300 MSFT 5F) (XEN) [00000029a1f49b9d] ACPI: DBG2 DC20E688, 0054 (r0 LENOVO TC-M1A 1300 MSFT 5F) (XEN) [00000029a1f4ae1c] ACPI: DMAR DC20E6E0, 00A8 (r1 LENOVO TC-M1A 1300 INTL 1) (XEN) [00000029a1f4c086] ACPI: TPM2 DC20E788, 0034 (r3 LENOVO TC-M1A 1300 AMI 0) (XEN) [00000029a1f4d249] ACPI: LUFT DC20E7C0, 349E2 (r1 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f4e755] ACPI: ASF! DC2431A8, 00A0 (r32 LENOVO TC-M1A 1300 TFSM F4240) (XEN) [00000029a1f4fb29] ACPI: BGRT DC243248, 0038 (r1 LENOVO TC-M1A 1300 AMI 10013) (XEN) [00000029a1f97d56] System RAM: 32655MB (33439356kB) (XEN) [00000029a3ce9023] No NUMA configuration found (XEN) [00000029a3cea2c2] Faking a node at 0000000000000000-000000081e000000 (XEN) [0000002a14f9c9cd] Domain heap initialised (XEN) [0000002a31acb2f6] vesafb: framebuffer at 0x00000000e0000000, mapped to 0xffff82c000201000, using 2048k, total 32704k (XEN) [0000002a31acc32d] vesafb: mode is 640x480x16, linelength=1280, font 8x8 (XEN) [0000002a31acd3a7] vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0 (XEN) [0000002a31ae02b7] CPU Vendor: Intel, Family 6 (0x6), Model 94 (0x5e), Stepping 3 (raw 000506e3) (XEN) [0000002a33af5975] found SMP MP-table at 000fccc0 (XEN) [0000002a33c42e12] SMBIOS 3.0 present. (XEN) [0000002a33d604d3] Using APIC driver default (XEN) [0000002a33e92765] XSM Framework v1.0.1 initialized (XEN) [0000002a33fe47d9] Initialising XSM SILO mode (XEN) [0000002a3412e2d5] ACPI: PM-Timer IO Port: 0x1808 (24 bits) (XEN) [0000002a342a651f] ACPI: v5 SLEEP INFO: control[0:0], status[0:0] (XEN) [0000002a3443b343] ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0] (XEN) [0000002a34556c75] ACPI: 32/64X FACS address mismatch in FADT - dc9edc40/0000000000000000, using 32 (XEN) [0000002a346cf1b8] ACPI: wakeup_vec[dc9edc4c], vec_size[20] (XEN) [0000002a3489d86a] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) (XEN) [0000002a34a43d2c] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) (XEN) [0000002a34bea309] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled) (XEN) [0000002a34d9044d] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled) (XEN) [0000002a34f36537] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled) (XEN) [0000002a35154603] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled) (XEN) [0000002a352fabd5] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled) (XEN) [0000002a354a1211] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled) (XEN) [0000002a3565df9a] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) (XEN) [0000002a35808be6] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) (XEN) [0000002a359b39cb] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) (XEN) [0000002a35b5e1e2] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) (XEN) [0000002a35d08c67] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) (XEN) [0000002a35eb33c2] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) (XEN) [0000002a3605d964] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) (XEN) [0000002a3620849c] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1]) (XEN) [0000002a363cb3e8] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) (XEN) [0000002a36595b80] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119 (XEN) [0000002a366cb8b1] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) [0000002a367d4db3] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) (XEN) [0000002a368f8431] Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) [0000002a36a8ada3] ACPI: HPET id: 0x8086a201 base: 0xfed00000 (XEN) [0000002a36c1dc02] PCI: MCFG configuration 0: base f8000000 segment 0000 buses 00 - 3f (XEN) [0000002a36d59445] PCI: MCFG area at f8000000 reserved in E820 (XEN) [0000002a36edfe08] PCI: Using MCFG for segment 0000 bus 00-3f (XEN) [0000002a37096b04] ACPI: BGRT: invalidating v1 image at 0xd7746018 (XEN) [0000002a3722e1f6] Using ACPI (MADT) for SMP configuration information (XEN) [0000002a373d810e] SMP: Allowing 8 CPUs (0 hotplug CPUs) (XEN) [0000002a375583e1] IRQ limits: 120 GSI, 1432 MSI/MSI-X (XEN) [0000002a386c7828] Switched to APIC driver x2apic_phys (XEN) [0000002a39621600] microcode: CPU0 updated from revision 0xc2 to 0xf0, date = 2021-11-12 (XEN) [0000002a3976c4a9] xstate: size: 0x440 and states: 0x1f (XEN) [0000002a398e1990] CPU0: Intel machine check reporting enabled (XEN) [0000002a39a6c574] Speculative mitigation facilities: (XEN) [0000002a39bc5c79] Hardware hints: RSBA (XEN) [0000002a39ce7e07] Hardware features: IBPB IBRS STIBP SSBD L1D_FLUSH MD_CLEAR SRBDS_CTRL (XEN) [0000002a39e33c78] Compiled-in support: INDIRECT_THUNK SHADOW_PAGING (XEN) [0000002a39fdb7c2] Xen settings: BTI-Thunk JMP, SPEC_CTRL: IBRS+ STIBP+ SSBD- PSFD-, Other: SRB_LOCK+ IBPB-ctxt L1D_FLUSH VERW BRANCH_HARDEN (XEN) [0000002a3a216953] L1TF: believed vulnerable, maxphysaddr L1D 46, CPUID 39, Safe address 8000000000 (XEN) [0000002a3ea3b3f4] Support for HVM VMs: MSR_SPEC_CTRL RSB EAGER_FPU MD_CLEAR (XEN) [0000002a432636ba] Support for PV VMs: MSR_SPEC_CTRL EAGER_FPU MD_CLEAR (XEN) [0000002a4576a629] XPTI (64-bit PV only): Dom0 enabled, DomU enabled (with PCID) (XEN) [0000002a4a097425] PV L1TF shadowing: Dom0 disabled, DomU enabled (XEN) [0000002a4c610011] Using scheduler: SMP Credit Scheduler (credit) (XEN) [0000002a57199776] Platform timer is 23.999MHz HPET (XEN) [ 7.506401] Detected 2808.027 MHz processor. (XEN) [ 7.517967] alt table ffff82d080456ed0 -> ffff82d080465d40 (XEN) [ 7.558340] Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB (XEN) [ 7.566526] Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB (XEN) [ 7.574524] Intel VT-d Snoop Control not enabled. (XEN) [ 7.582603] Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) [ 7.590514] Intel VT-d Queued Invalidation enabled. (XEN) [ 7.598541] Intel VT-d Interrupt Remapping enabled. (XEN) [ 7.606462] Intel VT-d Posted Interrupt not enabled. (XEN) [ 7.614484] Intel VT-d Shared EPT tables enabled. (XEN) [ 7.622943] I/O virtualisation enabled (XEN) [ 7.630834] - Dom0 mode: Relaxed (XEN) [ 7.638361] Interrupt remapping enabled (XEN) [ 7.646072] Enabled directed EOI with ioapic_ack_old on! (XEN) [ 7.664154] ENABLING IO-APIC IRQs (XEN) [ 7.671572] -> Using old ACK method (XEN) [ 7.679731] ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) [ 8.731232] Allocated console ring of 64 KiB. (XEN) [ 8.738655] VMX: Supported advanced features: (XEN) [ 8.746157] - APIC MMIO access virtualisation (XEN) [ 8.753474] - APIC TPR shadow (XEN) [ 8.760670] - Extended Page Tables (EPT) (XEN) [ 8.767984] - Virtual-Processor Identifiers (VPID) (XEN) [ 8.775270] - Virtual NMI (XEN) [ 8.782564] - MSR direct-access bitmap (XEN) [ 8.789819] - Unrestricted Guest (XEN) [ 8.797141] - VMCS shadowing (XEN) [ 8.804377] - VM Functions (XEN) [ 8.811691] - Virtualisation Exceptions (XEN) [ 8.818731] - Page Modification Logging (XEN) [ 8.825709] HVM: ASIDs enabled. (XEN) [ 8.832982] HVM: VMX enabled (XEN) [ 8.840043] HVM: Hardware Assisted Paging (HAP) detected (XEN) [ 8.847293] HVM: HAP page sizes: 4kB, 2MB, 1GB (XEN) [ 8.854603] alt table ffff82d080456ed0 -> ffff82d080465d40 (XEN) [0000002b3bc37dd3] microcode: CPU2 updated from revision 0xc2 to 0xf0, date = 2021-11-12 (XEN) [0000002b3e73fd42] microcode: CPU4 updated from revision 0xc2 to 0xf0, date = 2021-11-12 (XEN) [0000002b411f0901] microcode: CPU6 updated from revision 0xc2 to 0xf0, date = 2021-11-12 (XEN) [ 8.910909] Brought up 8 CPUs (XEN) [ 8.920420] Testing NMI watchdog on all CPUs: ok (XEN) [ 8.960783] Scheduling granularity: cpu, 1 CPU per sched-resource (XEN) [ 8.990587] mcheck_poll: Machine check polling timer started. (XEN) [ 9.000212] Dom0 has maximum 888 PIRQs (XEN) [ 9.009747] csched_alloc_domdata: setting dom 0 as the privileged domain (XEN) [ 9.028512] NX (Execute Disable) protection active (XEN) [ 9.037803] *** Building a PV Dom0 *** (XEN) [ 9.360826] Xen kernel: 64-bit, lsb, compat32 (XEN) [ 9.370255] Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x302c000 (XEN) [ 9.389229] PHYSICAL MEMORY ARRANGEMENT: (XEN) [ 9.398543] Dom0 alloc.: 0000000800000000->0000000804000000 (347002 pages to be allocated) (XEN) [ 9.417022] Init. ramdisk: 000000081cb7a000->000000081dfff214 (XEN) [ 9.426481] VIRTUAL MEMORY ARRANGEMENT: (XEN) [ 9.435756] Loaded kernel: ffffffff81000000->ffffffff8302c000 (XEN) [ 9.445126] Init. ramdisk: 0000000000000000->0000000000000000 (XEN) [ 9.454402] Phys-Mach map: 0000008000000000->00000080002d0000 (XEN) [ 9.463851] Start info: ffffffff8302c000->ffffffff8302c4b8 (XEN) [ 9.473127] Xenstore ring: 0000000000000000->0000000000000000 (XEN) [ 9.482765] Console ring: 0000000000000000->0000000000000000 (XEN) [ 9.492060] Page tables: ffffffff8302d000->ffffffff8304a000 (XEN) [ 9.501535] Boot stack: ffffffff8304a000->ffffffff8304b000 (XEN) [ 9.510888] TOTAL: ffffffff80000000->ffffffff83400000 (XEN) [ 9.520426] ENTRY ADDRESS: ffffffff8242b180 (XEN) [ 9.531941] Dom0 has maximum 4 VCPUs (XEN) [ 9.563237] Bogus DMIBAR 0xfed18001 on 0000:00:00.0 (XEN) [ 11.555503] Initial low memory virq threshold set at 0x4000 pages. (XEN) [ 11.562838] Scrubbing Free RAM in background (XEN) [ 11.569731] Std. Loglevel: Errors, warnings and info (XEN) [ 11.576733] Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) [ 11.584183] *************************************************** (XEN) [ 11.591265] Booted on L1TF-vulnerable hardware with SMT/Hyperthreading (XEN) [ 11.598532] enabled. Please assess your configuration and choose an (XEN) [ 11.605681] explicit 'smt=<bool>' setting. See XSA-273. (XEN) [ 11.612972] *************************************************** (XEN) [ 11.620097] Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading (XEN) [ 11.634350] enabled. Mitigations will not be fully effective. Please (XEN) [ 11.641605] choose an explicit smt=<bool> setting. See XSA-297. (XEN) [ 11.649048] *************************************************** (XEN) [ 11.656345] 3... 2... 1... (XEN) [ 14.665205] Xen is relinquishing VGA console. (XEN) [ 14.674950] *** Serial input to DOM0 (type 'CTRL-a' three times to switch input) (XEN) [ 14.675033] Freed 608kB init memory (XEN) [ 16.595699] Bogus DMIBAR 0xfed18001 on 0000:00:00.0
May be the culprit?
[250899.872516] print_req_error: I/O error, dev tde, sector 74674616
Thanks
-
After a bit more digging, tde is "xvdb" within the VM. This block device was on the iSCSI share, so I moved it over to the NFS share, since that seems to be the only difference with that device. All other drives are on the NFS share and having no issues. May be an issue with the iSCSI SR. I will create a test VM on there to troubleshoot more.
Both VM's have an xvda (OS Drive) and a xvdb with btrfs volumes on them that LXD consumes for container instances. This "tde" device was the only block device stored on the iSCSI SR, so thinking that may be where the I/O issue stems.
I set no authentication on the iSCSI share, so I can rule out authentication issues. Or I may just avoid iSCSI since it's a thick provision process. These are only test VM's so no data is at risk.
Here is the current SR list on this host:
uuid ( RO) : 3bf92a0f-26ea-cd46-e45c-0a896f267c3f name-label ( RW): XCP-ng Tools name-description ( RW): XCP-ng Tools ISOs host ( RO): <shared> type ( RO): iso content-type ( RO): iso uuid ( RO) : e506e70d-7351-647a-bd4a-8053b14c4c1f name-label ( RW): Local Storage name-description ( RW): host ( RO): X2-P320 type ( RO): ext content-type ( RO): user uuid ( RO) : df5a8a4c-50f1-799c-7523-b32064607937 name-label ( RW): XEN-NFS name-description ( RW): VM Shared Storage host ( RO): <shared> type ( RO): nfs content-type ( RO): user uuid ( RO) : 93dcfee9-1d23-e585-ca78-13cc781421fe name-label ( RW): ISO-NAS name-description ( RW): ISO Share host ( RO): <shared> type ( RO): iso content-type ( RO): iso uuid ( RO) : fc21f3a9-3651-404c-0756-155660acf953 name-label ( RW): LXD-Pool name-description ( RW): LXD Pools host ( RO): <shared> type ( RO): lvmoiscsi content-type ( RO): user uuid ( RO) : d3dabb64-cf3d-bb00-70aa-baf2240ac95e name-label ( RW): Local Storage name-description ( RW): host ( RO): X1-M920 type ( RO): ext content-type ( RO): user
LXD-Pool is the suspect SR, as the only device "tde" on this SR showed I/O errors.
Let me know if my thinking is correct on that.
-
@dj423
Second occurrence of kernel BUG error:[237621.042604] BUG: unable to handle page fault for address: ffff9bd9bc6c9840 [237621.049589] #PF: supervisor read access in kernel mode [237621.055044] #PF: error_code(0x0000) - not-present page
I am seeing some strange errors on the host (X1), for storage devices that no longer exist on the host. e.g.:
[331863.198883] print_req_error: I/O error, dev tde, sector 74842904 [332612.663953] block tdd: sector-size: 512/512 capacity: 178257920 [332612.711329] block tde: sector-size: 512/512 capacity: 251658240 [333679.019750] print_req_error: 86 callbacks suppressed [333679.019750] print_req_error: I/O error, dev tdd, sector 74924032
The storage devs on host:
tdc 254:2 0 85G 0 disk tda 254:0 0 120G 0 disk tdf 254:5 0 32G 0 disk
Any idea why I would see I/O errors for storage devices that are no longer on the host? I only see this on one VM on this host, all others are fine.