XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Possible kernel bug or memory errors?

    Scheduled Pinned Locked Moved Compute
    5 Posts 2 Posters 910 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      dj423
      last edited by

      Posting this here in case anyone has seen a kernel bug report like this. Not sure if this is caused by bad memory on the host or a kernel bug, since the error: "BUG: unable to handle page fault for address: ffff8bed005acef0" hints of a kernel bug.

      I have 2 VM's one on each XCP-ng host (8.2.1), so I am swapping the VM's onto the opposing hosts to see if the issue moves with the VM. Both VM's are running Ubuntu 22.04 LTS, and only one VM gets this dmesg output at random times. Heck, it may not even be an issue, but I noticed it a couple of times now on the VM console output.

      Both of these VM's are on shared NFS storage, and configured identical. Since the issue only crops up on one VM it makes me think I may have a hardware issue on one of the hosts.

      Here is the full output of dmesg:

      [43221.422334] audit: type=1400 audit(1681126980.459:66): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31" pid=19675 comm="apparmor_parser"
      [43221.551436] audit: type=1400 audit(1681126980.591:67): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31" pid=19680 comm="apparmor_parser"
      [43221.664895] audit: type=1400 audit(1681126980.703:68): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31-rootfs" pid=19684 comm="apparmor_parser"
      [43223.430838] audit: type=1400 audit(1681126982.467:69): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-cfe60e44dc78e532d0caa58247949b2838029de5fb77d687d89371acf22b7d31-rootfs" pid=19706 comm="apparmor_parser"
      [129621.800451] audit: type=1400 audit(1681213381.944:70): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852" pid=49675 comm="apparmor_parser"
      [129621.874920] audit: type=1400 audit(1681213382.020:71): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852" pid=49680 comm="apparmor_parser"
      [129621.952373] audit: type=1400 audit(1681213382.096:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852-rootfs" pid=49684 comm="apparmor_parser"
      [129623.813366] audit: type=1400 audit(1681213383.956:73): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd_archive-var-snap-lxd-common-lxd-storage-pools-pool1-images-b89dd4e241ca07a7f8cdef6109aaa2a02458a19ec80baaa2e3e1974d74eb7852-rootfs" pid=49700 comm="apparmor_parser"
      
      [129623.965681] BUG: unable to handle page fault for address: ffff8bed005acef0
      [129623.975064] #PF: supervisor write access in kernel mode
      [129623.981634] #PF: error_code(0x0002) - not-present page
      [129623.987448] PGD 0 P4D 0
      [129623.990986] Oops: 0002 [#1] SMP PTI
      [129623.995061] CPU: 1 PID: 49707 Comm: btrfs Tainted: P           O      5.15.0-69-generic #76-Ubuntu
      [129624.004045] Hardware name: Xen HVM domU, BIOS 4.13 03/21/2023
      [129624.010586] RIP: 0010:dentry_unlink_inode+0x50/0x130
      [129624.016233] Code: ff ff 8f fe 89 17 a9 00 00 08 00 74 08 65 48 ff 05 4d ee 87 76 49 8b 84 24 b8 00 00 00 48 85 c0 74 2c 49 8b 94 24 b0 00 00 00 <48> 89 10 48 85 d2 74 04 48 89 42 08 49 c7 84 24 b0 00 00 00 00 00
      [129624.034534] RSP: 0018:ffff9dbd421e7c10 EFLAGS: 00010286
      [129624.040161] RAX: ffff8bed005acef0 RBX: ffff8fec0a668c00 RCX: 0000000000000000
      [129624.047475] RDX: 0000000000000000 RSI: ffff8fed442d6100 RDI: ffff8fec0a668e40
      [129624.054969] RBP: ffff9dbd421e7c20 R08: 0000000000000001 R09: 0000000000000000
      [129624.062487] R10: ffff8fec097f4410 R11: ffff8febc6832000 R12: ffff8fec0a668e40
      [129624.070518] R13: ffff8fed005acdb8 R14: ffff8fec0a668e40 R15: ffff8febc6832440
      [129624.077917] FS:  00007f8898c208c0(0000) GS:ffff8fed44a40000(0000) knlGS:0000000000000000
      [129624.086420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [129624.093075] CR2: ffff8bed005acef0 CR3: 0000000048122004 CR4: 00000000003706e0
      [129624.100365] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [129624.108491] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [129624.116430] Call Trace:
      [129624.119822]  <TASK>
      [129624.122903]  __dentry_kill+0xeb/0x190
      [129624.127399]  shrink_dentry_list+0x86/0x150
      [129624.132202]  shrink_dcache_parent+0xcc/0x120
      [129624.137232]  d_invalidate+0x6f/0xf0
      [129624.141512]  btrfs_delete_subvolume+0x281/0x510 [btrfs]
      [129624.147381]  btrfs_ioctl_snap_destroy+0x615/0x730 [btrfs]
      [129624.153444]  btrfs_ioctl+0x13f/0x1160 [btrfs]
      [129624.158776]  ? handle_mm_fault+0xd8/0x2c0
      [129624.163809]  __x64_sys_ioctl+0x92/0xd0
      [129624.168552]  do_syscall_64+0x59/0xc0
      [129624.173235]  ? irqentry_exit+0x1d/0x30
      [129624.177930]  ? exc_page_fault+0x89/0x170
      [129624.183007]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
      [129624.189050] RIP: 0033:0x7f8898d373ab
      [129624.193621] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48
      [129624.213015] RSP: 002b:00007ffdd1bb59e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [129624.221570] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f8898d373ab
      [129624.229797] RDX: 00007ffdd1bb5a20 RSI: 000000005000940f RDI: 0000000000000003
      [129624.237972] RBP: 0000000000000003 R08: 0000564feae05364 R09: 0000000000000095
      [129624.246112] R10: 0000564feab1efbf R11: 0000000000000246 R12: 0000000000000000
      [129624.254349] R13: 0000564feae05364 R14: 0000000000000003 R15: 00007ffdd1bb88c5
      [129624.262563]  </TASK>
      [129624.265884] Modules linked in: tls veth unix_diag nft_masq nft_chain_nat zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nf_tables nfnetlink vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xenfs xen_privcmd bridge stp llc binfmt_misc nls_iso8859_1 ppdev joydev input_leds serio_raw parport_pc parport sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel usbhid aesni_intel crypto_simd hid cryptd psmouse floppy
      [129624.346933] CR2: ffff8bed005acef0
      [129624.351431] ---[ end trace dd21c84404389223 ]---
      [129624.356996] RIP: 0010:dentry_unlink_inode+0x50/0x130
      [129624.362912] Code: ff ff 8f fe 89 17 a9 00 00 08 00 74 08 65 48 ff 05 4d ee 87 76 49 8b 84 24 b8 00 00 00 48 85 c0 74 2c 49 8b 94 24 b0 00 00 00 <48> 89 10 48 85 d2 74 04 48 89 42 08 49 c7 84 24 b0 00 00 00 00 00
      [129624.381901] RSP: 0018:ffff9dbd421e7c10 EFLAGS: 00010286
      [129624.387982] RAX: ffff8bed005acef0 RBX: ffff8fec0a668c00 RCX: 0000000000000000
      [129624.396069] RDX: 0000000000000000 RSI: ffff8fed442d6100 RDI: ffff8fec0a668e40
      [129624.404315] RBP: ffff9dbd421e7c20 R08: 0000000000000001 R09: 0000000000000000
      [129624.412508] R10: ffff8fec097f4410 R11: ffff8febc6832000 R12: ffff8fec0a668e40
      [129624.420699] R13: ffff8fed005acdb8 R14: ffff8fec0a668e40 R15: ffff8febc6832440
      [129624.428879] FS:  00007f8898c208c0(0000) GS:ffff8fed44a40000(0000) knlGS:0000000000000000
      [129624.437965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [129624.444887] CR2: ffff8bed005acef0 CR3: 0000000048122004 CR4: 00000000003706e0
      [129624.453708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [129624.462942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      
      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Do you have anything in Dom0 dmesg and xl dmesg?

        D 1 Reply Last reply Reply Quote 0
        • D Offline
          dj423 @olivierlambert
          last edited by

          @olivierlambert

          Some of the ugly dmesg output from host with the troubled VM:

          [242392.838917] vif vif-7-0 vif7.0: Guest Rx ready
          [242421.920556] block tdd: sector-size: 512/512 capacity: 251658240
          [242427.246697] block tde: sector-size: 512/512 capacity: 251658240
          [242428.379056] block tdf: sector-size: 512/512 capacity: 251658240
          [243384.664175] block tde: sector-size: 512/512 capacity: 251658240
          [243385.760070] block tdf: sector-size: 512/512 capacity: 251658240
          [244310.448680] block tde: sector-size: 512/512 capacity: 251658240
          [244311.556701] block tdf: sector-size: 512/512 capacity: 251658240
          [244312.702292] block tdg: sector-size: 512/512 capacity: 251658240
          [244321.783435] device vif8.0 entered promiscuous mode
          [244336.272665] block tda: sector-size: 512/512 capacity: 178257920
          [244337.035240] device vif7.0 left promiscuous mode
          [244337.251070] vif vif-8-0 vif8.0: Guest Rx ready
          [244362.015783] block tdc: sector-size: 512/512 capacity: 178257920
          [244367.775702] block tde: sector-size: 512/512 capacity: 178257920
          [244368.875340] block tdf: sector-size: 512/512 capacity: 178257920
          [244984.216622] block tde: sector-size: 512/512 capacity: 178257920
          [244985.375864] block tdf: sector-size: 512/512 capacity: 178257920
          [245602.874372] block tde: sector-size: 512/512 capacity: 178257920
          [245604.076988] block tdf: sector-size: 512/512 capacity: 178257920
          [245605.204176] block tdg: sector-size: 512/512 capacity: 178257920
          [245627.262457] device vif9.0 entered promiscuous mode
          [245640.879281] block tda: sector-size: 512/512 capacity: 251658240
          [245641.649783] device vif8.0 left promiscuous mode
          [245641.898617] vif vif-9-0 vif9.0: Guest Rx ready
          [246215.439267] block tdd: sector-size: 512/512 capacity: 178257920
          [246215.479136] block tde: sector-size: 512/512 capacity: 251658240
          [247387.052767] print_req_error: I/O error, dev tde, sector 75145216
          [247387.052820] print_req_error: I/O error, dev tde, sector 75145304
          [247387.052865] print_req_error: I/O error, dev tde, sector 75145392
          [247387.052930] print_req_error: I/O error, dev tde, sector 75145480
          [247387.053039] print_req_error: I/O error, dev tde, sector 75145568
          [247387.053087] print_req_error: I/O error, dev tde, sector 75145656
          [247387.053137] print_req_error: I/O error, dev tde, sector 75145744
          [247387.053183] print_req_error: I/O error, dev tde, sector 75145832
          [247387.053233] print_req_error: I/O error, dev tde, sector 75145920
          [247387.053282] print_req_error: I/O error, dev tde, sector 75146008
          [248010.875068] block tdd: sector-size: 512/512 capacity: 251658240
          [248010.913350] block tde: sector-size: 512/512 capacity: 178257920
          [249059.521257] print_req_error: 86 callbacks suppressed
          [249059.521258] print_req_error: I/O error, dev tde, sector 74633216
          [249059.521309] print_req_error: I/O error, dev tde, sector 74633304
          [249059.521354] print_req_error: I/O error, dev tde, sector 74633392
          [249059.521399] print_req_error: I/O error, dev tde, sector 74633480
          [249059.521443] print_req_error: I/O error, dev tde, sector 74633568
          [249059.521647] print_req_error: I/O error, dev tde, sector 74633656
          [249059.521713] print_req_error: I/O error, dev tde, sector 74633744
          [249059.521737] print_req_error: I/O error, dev tde, sector 74633832
          [249059.521737] print_req_error: I/O error, dev tde, sector 74633920
          [249059.521737] print_req_error: I/O error, dev tde, sector 74634008
          [249811.475374] block tdd: sector-size: 512/512 capacity: 251658240
          [249811.519873] block tde: sector-size: 512/512 capacity: 178257920
          [250899.872276] print_req_error: 86 callbacks suppressed
          [250899.872277] print_req_error: I/O error, dev tde, sector 74674176
          [250899.872329] print_req_error: I/O error, dev tde, sector 74674264
          [250899.872379] print_req_error: I/O error, dev tde, sector 74674352
          [250899.872424] print_req_error: I/O error, dev tde, sector 74674440
          [250899.872472] print_req_error: I/O error, dev tde, sector 74674528
          [250899.872516] print_req_error: I/O error, dev tde, sector 74674616
          [250899.872560] print_req_error: I/O error, dev tde, sector 74674704
          [250899.872603] print_req_error: I/O error, dev tde, sector 74674792
          [250899.872646] print_req_error: I/O error, dev tde, sector 74674880
          [250899.872689] print_req_error: I/O error, dev tde, sector 74674968
          [251616.965255] block tdd: sector-size: 512/512 capacity: 251658240
          [251617.009656] block tde: sector-size: 512/512 capacity: 178257920
          
          

          Output of xl dmesg on same host:

           \ \/ /___ _ __   | || |  / |___ / | ___|
            \  // _ \ '_ \  | || |_ | | |_ \ |___ \
            /  \  __/ | | | |__   _|| |___) | ___) |
           /_/\_\___|_| |_|    |_|(_)_|____(_)____/
          
          (XEN) [000000299f9da931] Xen version 4.13.5-9.30 (mockbuild@[unknown]) (gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)) debug=n  Tue Mar 21 13:25:35 CET 2023
          (XEN) [000000299f9db8ab] Latest ChangeSet: 708e83f0e7d1, pq c81a2963b08d
          (XEN) [000000299f9dcc95] build-id: f5d6abf4ba12c46c496f6d910114c239298148bf
          (XEN) [000000299f9dd548] Bootloader: GRUB 2.02
          (XEN) [000000299f9dde9e] Command line: dom0_mem=1440M,max:1440M watchdog ucode=scan dom0_max_vcpus=1-4 crashkernel=256M,below=4G console=vga vga=mode-0x0311
          (XEN) [000000299f9decb1] Xen image load base address: 0xda800000
          (XEN) [000000299f9df44e] Video information:
          (XEN) [000000299f9dff64]  VGA is graphics mode 640x480, 16 bpp
          (XEN) [000000299f9e0a94]  VBE/DDC methods: V2; EDID transfer time: 1 seconds
          (XEN) [000000299f9e150b] Disc information:
          (XEN) [000000299f9e1cc6]  Found 1 MBR signatures
          (XEN) [000000299f9e2493]  Found 2 EDD information structures
          (XEN) [000000299f9eb3b6] Xen-e820 RAM map:
          (XEN) [000000299f9ec0b1]  0000000000000000 - 000000000009dc00 (usable)
          (XEN) [000000299f9ecdde]  000000000009dc00 - 00000000000a0000 (reserved)
          (XEN) [000000299f9edb15]  00000000000e0000 - 0000000000100000 (reserved)
          (XEN) [000000299f9ee817]  0000000000100000 - 00000000d4c60000 (usable)
          (XEN) [000000299f9ef4b1]  00000000d4c60000 - 00000000d4c61000 (ACPI NVS)
          (XEN) [000000299f9f010e]  00000000d4c61000 - 00000000d4c62000 (reserved)
          (XEN) [000000299f9f0c4e]  00000000d4c62000 - 00000000daf48000 (usable)
          (XEN) [000000299f9f174e]  00000000daf48000 - 00000000dc1ba000 (reserved)
          (XEN) [000000299f9f2236]  00000000dc1ba000 - 00000000dc244000 (ACPI data)
          (XEN) [000000299f9f2e85]  00000000dc244000 - 00000000dca1e000 (ACPI NVS)
          (XEN) [000000299f9f3981]  00000000dca1e000 - 00000000dcf44000 (reserved)
          (XEN) [000000299f9f4439]  00000000dcf44000 - 00000000dd000000 (usable)
          (XEN) [000000299f9f4f02]  00000000dd000000 - 00000000e0000000 (reserved)
          (XEN) [000000299f9f59e8]  00000000f8000000 - 00000000fc000000 (reserved)
          (XEN) [000000299f9f6483]  00000000fe000000 - 00000000fe011000 (reserved)
          (XEN) [000000299f9f6fcc]  00000000fec00000 - 00000000fec01000 (reserved)
          (XEN) [000000299f9f7b07]  00000000fed00000 - 00000000fed01000 (reserved)
          (XEN) [000000299f9f85bc]  00000000fee00000 - 00000000fee01000 (reserved)
          (XEN) [000000299f9f9174]  00000000ff000000 - 0000000100000000 (reserved)
          (XEN) [000000299f9f9d92]  0000000100000000 - 000000081e000000 (usable)
          (XEN) [00000029a1ebb587] Kdump: 256MB (262144kB) at 0xc4c00000
          (XEN) [00000029a1f2bb4c] ACPI: RSDP 000F05B0, 0024 (r2 LENOVO)
          (XEN) [00000029a1f2d902] ACPI: XSDT DC1D50B8, 00EC (r1 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f2fc14] ACPI: FACP DC203650, 0114 (r6 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f31df4] ACPI: DSDT DC1D5230, 2E41B (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f334e7] ACPI: FACS DC9EDC40, 0040
          (XEN) [00000029a1f343bf] ACPI: APIC DC203768, 00BC (r3 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f356df] ACPI: FPDT DC203828, 0044 (r1 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f36965] ACPI: MCFG DC203870, 003C (r1 LENOVO TC-M1A       1300 MSFT       97)
          (XEN) [00000029a1f37bbe] ACPI: SSDT DC2038B0, 03BC (r1 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f38fda] ACPI: FIDT DC203C70, 009C (r1 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f3a224] ACPI: SLIC DC203D10, 0176 (r1 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f3b537] ACPI: MSDM DC203E88, 0055 (r3 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f3c74b] ACPI: SSDT DC203EE0, 3159 (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f3db45] ACPI: SSDT DC207040, 26E8 (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f3f1bc] ACPI: HPET DC209728, 0038 (r1 LENOVO TC-M1A       1300 MSFT       5F)
          (XEN) [00000029a1f403ab] ACPI: SSDT DC209760, 0C59 (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f417fb] ACPI: UEFI DC20A3C0, 0042 (r1 LENOVO TC-M1A       1300       1000013)
          (XEN) [00000029a1f42b40] ACPI: SSDT DC20A408, 0EDE (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f43f15] ACPI: LPIT DC20B2E8, 0094 (r1 LENOVO TC-M1A       1300 MSFT       5F)
          (XEN) [00000029a1f45106] ACPI: WSMT DC20B380, 0028 (r1 LENOVO TC-M1A       1300 MSFT       5F)
          (XEN) [00000029a1f462cb] ACPI: SSDT DC20B3A8, 029F (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f4762a] ACPI: SSDT DC20B648, 3002 (r2 LENOVO TC-M1A       1300 INTL 20160422)
          (XEN) [00000029a1f489ca] ACPI: DBGP DC20E650, 0034 (r1 LENOVO TC-M1A       1300 MSFT       5F)
          (XEN) [00000029a1f49b9d] ACPI: DBG2 DC20E688, 0054 (r0 LENOVO TC-M1A       1300 MSFT       5F)
          (XEN) [00000029a1f4ae1c] ACPI: DMAR DC20E6E0, 00A8 (r1 LENOVO TC-M1A       1300 INTL        1)
          (XEN) [00000029a1f4c086] ACPI: TPM2 DC20E788, 0034 (r3 LENOVO TC-M1A       1300 AMI         0)
          (XEN) [00000029a1f4d249] ACPI: LUFT DC20E7C0, 349E2 (r1 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f4e755] ACPI: ASF! DC2431A8, 00A0 (r32 LENOVO TC-M1A       1300 TFSM    F4240)
          (XEN) [00000029a1f4fb29] ACPI: BGRT DC243248, 0038 (r1 LENOVO TC-M1A       1300 AMI     10013)
          (XEN) [00000029a1f97d56] System RAM: 32655MB (33439356kB)
          (XEN) [00000029a3ce9023] No NUMA configuration found
          (XEN) [00000029a3cea2c2] Faking a node at 0000000000000000-000000081e000000
          (XEN) [0000002a14f9c9cd] Domain heap initialised
          (XEN) [0000002a31acb2f6] vesafb: framebuffer at 0x00000000e0000000, mapped to 0xffff82c000201000, using 2048k, total 32704k
          (XEN) [0000002a31acc32d] vesafb: mode is 640x480x16, linelength=1280, font 8x8
          (XEN) [0000002a31acd3a7] vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
          (XEN) [0000002a31ae02b7] CPU Vendor: Intel, Family 6 (0x6), Model 94 (0x5e), Stepping 3 (raw 000506e3)
          (XEN) [0000002a33af5975] found SMP MP-table at 000fccc0
          (XEN) [0000002a33c42e12] SMBIOS 3.0 present.
          (XEN) [0000002a33d604d3] Using APIC driver default
          (XEN) [0000002a33e92765] XSM Framework v1.0.1 initialized
          (XEN) [0000002a33fe47d9] Initialising XSM SILO mode
          (XEN) [0000002a3412e2d5] ACPI: PM-Timer IO Port: 0x1808 (24 bits)
          (XEN) [0000002a342a651f] ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
          (XEN) [0000002a3443b343] ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0]
          (XEN) [0000002a34556c75] ACPI: 32/64X FACS address mismatch in FADT - dc9edc40/0000000000000000, using 32
          (XEN) [0000002a346cf1b8] ACPI:             wakeup_vec[dc9edc4c], vec_size[20]
          (XEN) [0000002a3489d86a] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
          (XEN) [0000002a34a43d2c] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
          (XEN) [0000002a34bea309] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
          (XEN) [0000002a34d9044d] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
          (XEN) [0000002a34f36537] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
          (XEN) [0000002a35154603] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
          (XEN) [0000002a352fabd5] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled)
          (XEN) [0000002a354a1211] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
          (XEN) [0000002a3565df9a] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
          (XEN) [0000002a35808be6] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
          (XEN) [0000002a359b39cb] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
          (XEN) [0000002a35b5e1e2] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
          (XEN) [0000002a35d08c67] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
          (XEN) [0000002a35eb33c2] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
          (XEN) [0000002a3605d964] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
          (XEN) [0000002a3620849c] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
          (XEN) [0000002a363cb3e8] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
          (XEN) [0000002a36595b80] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119
          (XEN) [0000002a366cb8b1] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
          (XEN) [0000002a367d4db3] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
          (XEN) [0000002a368f8431] Enabling APIC mode:  Flat.  Using 1 I/O APICs
          (XEN) [0000002a36a8ada3] ACPI: HPET id: 0x8086a201 base: 0xfed00000
          (XEN) [0000002a36c1dc02] PCI: MCFG configuration 0: base f8000000 segment 0000 buses 00 - 3f
          (XEN) [0000002a36d59445] PCI: MCFG area at f8000000 reserved in E820
          (XEN) [0000002a36edfe08] PCI: Using MCFG for segment 0000 bus 00-3f
          (XEN) [0000002a37096b04] ACPI: BGRT: invalidating v1 image at 0xd7746018
          (XEN) [0000002a3722e1f6] Using ACPI (MADT) for SMP configuration information
          (XEN) [0000002a373d810e] SMP: Allowing 8 CPUs (0 hotplug CPUs)
          (XEN) [0000002a375583e1] IRQ limits: 120 GSI, 1432 MSI/MSI-X
          (XEN) [0000002a386c7828] Switched to APIC driver x2apic_phys
          (XEN) [0000002a39621600] microcode: CPU0 updated from revision 0xc2 to 0xf0, date = 2021-11-12
          (XEN) [0000002a3976c4a9] xstate: size: 0x440 and states: 0x1f
          (XEN) [0000002a398e1990] CPU0: Intel machine check reporting enabled
          (XEN) [0000002a39a6c574] Speculative mitigation facilities:
          (XEN) [0000002a39bc5c79]   Hardware hints: RSBA
          (XEN) [0000002a39ce7e07]   Hardware features: IBPB IBRS STIBP SSBD L1D_FLUSH MD_CLEAR SRBDS_CTRL
          (XEN) [0000002a39e33c78]   Compiled-in support: INDIRECT_THUNK SHADOW_PAGING
          (XEN) [0000002a39fdb7c2]   Xen settings: BTI-Thunk JMP, SPEC_CTRL: IBRS+ STIBP+ SSBD- PSFD-, Other: SRB_LOCK+ IBPB-ctxt L1D_FLUSH VERW BRANCH_HARDEN
          (XEN) [0000002a3a216953]   L1TF: believed vulnerable, maxphysaddr L1D 46, CPUID 39, Safe address 8000000000
          (XEN) [0000002a3ea3b3f4]   Support for HVM VMs: MSR_SPEC_CTRL RSB EAGER_FPU MD_CLEAR
          (XEN) [0000002a432636ba]   Support for PV VMs: MSR_SPEC_CTRL EAGER_FPU MD_CLEAR
          (XEN) [0000002a4576a629]   XPTI (64-bit PV only): Dom0 enabled, DomU enabled (with PCID)
          (XEN) [0000002a4a097425]   PV L1TF shadowing: Dom0 disabled, DomU enabled
          (XEN) [0000002a4c610011] Using scheduler: SMP Credit Scheduler (credit)
          (XEN) [0000002a57199776] Platform timer is 23.999MHz HPET
          (XEN) [    7.506401] Detected 2808.027 MHz processor.
          (XEN) [    7.517967] alt table ffff82d080456ed0 -> ffff82d080465d40
          (XEN) [    7.558340] Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB
          (XEN) [    7.566526] Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB
          (XEN) [    7.574524] Intel VT-d Snoop Control not enabled.
          (XEN) [    7.582603] Intel VT-d Dom0 DMA Passthrough not enabled.
          (XEN) [    7.590514] Intel VT-d Queued Invalidation enabled.
          (XEN) [    7.598541] Intel VT-d Interrupt Remapping enabled.
          (XEN) [    7.606462] Intel VT-d Posted Interrupt not enabled.
          (XEN) [    7.614484] Intel VT-d Shared EPT tables enabled.
          (XEN) [    7.622943] I/O virtualisation enabled
          (XEN) [    7.630834]  - Dom0 mode: Relaxed
          (XEN) [    7.638361] Interrupt remapping enabled
          (XEN) [    7.646072] Enabled directed EOI with ioapic_ack_old on!
          (XEN) [    7.664154] ENABLING IO-APIC IRQs
          (XEN) [    7.671572]  -> Using old ACK method
          (XEN) [    7.679731] ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
          (XEN) [    8.731232] Allocated console ring of 64 KiB.
          (XEN) [    8.738655] VMX: Supported advanced features:
          (XEN) [    8.746157]  - APIC MMIO access virtualisation
          (XEN) [    8.753474]  - APIC TPR shadow
          (XEN) [    8.760670]  - Extended Page Tables (EPT)
          (XEN) [    8.767984]  - Virtual-Processor Identifiers (VPID)
          (XEN) [    8.775270]  - Virtual NMI
          (XEN) [    8.782564]  - MSR direct-access bitmap
          (XEN) [    8.789819]  - Unrestricted Guest
          (XEN) [    8.797141]  - VMCS shadowing
          (XEN) [    8.804377]  - VM Functions
          (XEN) [    8.811691]  - Virtualisation Exceptions
          (XEN) [    8.818731]  - Page Modification Logging
          (XEN) [    8.825709] HVM: ASIDs enabled.
          (XEN) [    8.832982] HVM: VMX enabled
          (XEN) [    8.840043] HVM: Hardware Assisted Paging (HAP) detected
          (XEN) [    8.847293] HVM: HAP page sizes: 4kB, 2MB, 1GB
          (XEN) [    8.854603] alt table ffff82d080456ed0 -> ffff82d080465d40
          (XEN) [0000002b3bc37dd3] microcode: CPU2 updated from revision 0xc2 to 0xf0, date = 2021-11-12
          (XEN) [0000002b3e73fd42] microcode: CPU4 updated from revision 0xc2 to 0xf0, date = 2021-11-12
          (XEN) [0000002b411f0901] microcode: CPU6 updated from revision 0xc2 to 0xf0, date = 2021-11-12
          (XEN) [    8.910909] Brought up 8 CPUs
          (XEN) [    8.920420] Testing NMI watchdog on all CPUs: ok
          (XEN) [    8.960783] Scheduling granularity: cpu, 1 CPU per sched-resource
          (XEN) [    8.990587] mcheck_poll: Machine check polling timer started.
          (XEN) [    9.000212] Dom0 has maximum 888 PIRQs
          (XEN) [    9.009747] csched_alloc_domdata: setting dom 0 as the privileged domain
          (XEN) [    9.028512] NX (Execute Disable) protection active
          (XEN) [    9.037803] *** Building a PV Dom0 ***
          (XEN) [    9.360826]  Xen  kernel: 64-bit, lsb, compat32
          (XEN) [    9.370255]  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x302c000
          (XEN) [    9.389229] PHYSICAL MEMORY ARRANGEMENT:
          (XEN) [    9.398543]  Dom0 alloc.:   0000000800000000->0000000804000000 (347002 pages to be allocated)
          (XEN) [    9.417022]  Init. ramdisk: 000000081cb7a000->000000081dfff214
          (XEN) [    9.426481] VIRTUAL MEMORY ARRANGEMENT:
          (XEN) [    9.435756]  Loaded kernel: ffffffff81000000->ffffffff8302c000
          (XEN) [    9.445126]  Init. ramdisk: 0000000000000000->0000000000000000
          (XEN) [    9.454402]  Phys-Mach map: 0000008000000000->00000080002d0000
          (XEN) [    9.463851]  Start info:    ffffffff8302c000->ffffffff8302c4b8
          (XEN) [    9.473127]  Xenstore ring: 0000000000000000->0000000000000000
          (XEN) [    9.482765]  Console ring:  0000000000000000->0000000000000000
          (XEN) [    9.492060]  Page tables:   ffffffff8302d000->ffffffff8304a000
          (XEN) [    9.501535]  Boot stack:    ffffffff8304a000->ffffffff8304b000
          (XEN) [    9.510888]  TOTAL:         ffffffff80000000->ffffffff83400000
          (XEN) [    9.520426]  ENTRY ADDRESS: ffffffff8242b180
          (XEN) [    9.531941] Dom0 has maximum 4 VCPUs
          (XEN) [    9.563237] Bogus DMIBAR 0xfed18001 on 0000:00:00.0
          (XEN) [   11.555503] Initial low memory virq threshold set at 0x4000 pages.
          (XEN) [   11.562838] Scrubbing Free RAM in background
          (XEN) [   11.569731] Std. Loglevel: Errors, warnings and info
          (XEN) [   11.576733] Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
          (XEN) [   11.584183] ***************************************************
          (XEN) [   11.591265] Booted on L1TF-vulnerable hardware with SMT/Hyperthreading
          (XEN) [   11.598532] enabled.  Please assess your configuration and choose an
          (XEN) [   11.605681] explicit 'smt=<bool>' setting.  See XSA-273.
          (XEN) [   11.612972] ***************************************************
          (XEN) [   11.620097] Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading
          (XEN) [   11.634350] enabled.  Mitigations will not be fully effective.  Please
          (XEN) [   11.641605] choose an explicit smt=<bool> setting.  See XSA-297.
          (XEN) [   11.649048] ***************************************************
          (XEN) [   11.656345] 3... 2... 1...
          (XEN) [   14.665205] Xen is relinquishing VGA console.
          (XEN) [   14.674950] *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
          (XEN) [   14.675033] Freed 608kB init memory
          (XEN) [   16.595699] Bogus DMIBAR 0xfed18001 on 0000:00:00.0
          
          

          May be the culprit?

          [250899.872516] print_req_error: I/O error, dev tde, sector 74674616
          

          Thanks

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            dj423 @dj423
            last edited by

            @dj423

            After a bit more digging, tde is "xvdb" within the VM. This block device was on the iSCSI share, so I moved it over to the NFS share, since that seems to be the only difference with that device. All other drives are on the NFS share and having no issues. May be an issue with the iSCSI SR. I will create a test VM on there to troubleshoot more.

            Both VM's have an xvda (OS Drive) and a xvdb with btrfs volumes on them that LXD consumes for container instances. This "tde" device was the only block device stored on the iSCSI SR, so thinking that may be where the I/O issue stems.

            I set no authentication on the iSCSI share, so I can rule out authentication issues. Or I may just avoid iSCSI since it's a thick provision process. These are only test VM's so no data is at risk.

            Here is the current SR list on this host:

            uuid ( RO)                : 3bf92a0f-26ea-cd46-e45c-0a896f267c3f
                      name-label ( RW): XCP-ng Tools
                name-description ( RW): XCP-ng Tools ISOs
                            host ( RO): <shared>
                            type ( RO): iso
                    content-type ( RO): iso
            
            
            uuid ( RO)                : e506e70d-7351-647a-bd4a-8053b14c4c1f
                      name-label ( RW): Local Storage
                name-description ( RW):
                            host ( RO): X2-P320
                            type ( RO): ext
                    content-type ( RO): user
            
            
            uuid ( RO)                : df5a8a4c-50f1-799c-7523-b32064607937
                      name-label ( RW): XEN-NFS
                name-description ( RW): VM Shared Storage
                            host ( RO): <shared>
                            type ( RO): nfs
                    content-type ( RO): user
            
            
            uuid ( RO)                : 93dcfee9-1d23-e585-ca78-13cc781421fe
                      name-label ( RW): ISO-NAS
                name-description ( RW): ISO Share
                            host ( RO): <shared>
                            type ( RO): iso
                    content-type ( RO): iso
            
            
            uuid ( RO)                : fc21f3a9-3651-404c-0756-155660acf953
                      name-label ( RW): LXD-Pool
                name-description ( RW): LXD Pools
                            host ( RO): <shared>
                            type ( RO): lvmoiscsi
                    content-type ( RO): user
            
            
            uuid ( RO)                : d3dabb64-cf3d-bb00-70aa-baf2240ac95e
                      name-label ( RW): Local Storage
                name-description ( RW):
                            host ( RO): X1-M920
                            type ( RO): ext
                    content-type ( RO): user
            
            
            

            LXD-Pool is the suspect SR, as the only device "tde" on this SR showed I/O errors.

            Let me know if my thinking is correct on that.

            D 1 Reply Last reply Reply Quote 0
            • D Offline
              dj423 @dj423
              last edited by

              @dj423
              Second occurrence of kernel BUG error:

              [237621.042604] BUG: unable to handle page fault for address: ffff9bd9bc6c9840
              [237621.049589] #PF: supervisor read access in kernel mode
              [237621.055044] #PF: error_code(0x0000) - not-present page
              
              

              I am seeing some strange errors on the host (X1), for storage devices that no longer exist on the host. e.g.:

              [331863.198883] print_req_error: I/O error, dev tde, sector 74842904
              [332612.663953] block tdd: sector-size: 512/512 capacity: 178257920
              [332612.711329] block tde: sector-size: 512/512 capacity: 251658240
              [333679.019750] print_req_error: 86 callbacks suppressed
              [333679.019750] print_req_error: I/O error, dev tdd, sector 74924032
              
              

              The storage devs on host:

              tdc                                                                                                   254:2    0    85G  0 disk
              tda                                                                                                   254:0    0   120G  0 disk
              tdf                                                                                                   254:5    0    32G  0 disk
              
              

              Any idea why I would see I/O errors for storage devices that are no longer on the host? I only see this on one VM on this host, all others are fine.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post