@andyhhp
@andyhhp said in Diagnosing frequent crashes on host:
@the_jest said in Diagnosing frequent crashes on host:
but I figured I'd mention it. (Also, "Shot down" should be "Shut down".)
Shot down is correct. It is the past tense of "Shoot down", because the companion message you get when something went wrong is "Failed to shoot down $CPUS", and is the single most valuable print message I've ever inserted into the code.
My apologies!
@andyhhp said in Diagnosing frequent crashes on host:
The snippet of xen.log you've posted suggests it's a linux kernel crash, so look at dom0.log, and right at the end.
The last consecutive block of messages (timewise, i.e. the part of this log from the same milisecond to the end of this log), is
[ 19701.650235] WARN: Call Trace:
[ 19701.650238] WARN: <IRQ>
[ 19701.650241] WARN: xen_evtchn_do_upcall+0x27/0x50
[ 19701.650245] WARN: xen_do_hypervisor_callback+0x29/0x40
[ 19701.650248] WARN: </IRQ>
[ 19701.650250] WARN: RIP: e030:xen_hypercall_sched_op+0xa/0x20
[ 19701.650253] WARN: Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[ 19701.650259] WARN: RSP: e02b:ffffc900400e7eb0 EFLAGS: 00000246
[ 19701.650261] WARN: RAX: 0000000000000000 RBX: ffff8881db639d00 RCX: ffffffff810013aa
[ 19701.650264] WARN: RDX: ffffffff8203d250 RSI: 0000000000000000 RDI: 0000000000000001
[ 19701.650267] WARN: RBP: 0000000000000004 R08: 0000000000000008 R09: 000011eef4cd8cc2
[ 19701.650269] WARN: R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
[ 19701.650272] WARN: R13: 0000000000000000 R14: ffff8881db639d00 R15: ffff8881db639d00
[ 19701.650276] WARN: ? xen_hypercall_sched_op+0xa/0x20
[ 19701.650279] WARN: ? xen_safe_halt+0xc/0x20
[ 19701.650282] WARN: ? default_idle+0x1a/0x140
[ 19701.650284] WARN: ? do_idle+0x1ea/0x260
[ 19701.650287] WARN: ? cpu_startup_entry+0x6f/0x80
[ 19701.650289] WARN: Modules linked in: tun nfsv3 nfs_acl nfs lockd grace fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter dm_multipath sunrpc nls_iso8859_1 nls_cp437 intel_powerclamp crct10dif_pclmul vfat crc32_pclmul ghash_clmulni_intel fat pcbc dm_mod aesni_intel aes_x86_64 crypto_simd cryptd glue_helper video backlight ip_tables x_tables hid_generic usbhid hid xhci_pci nvme igc(O) xhci_hcd i40e(O) nvme_core scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod efivarfs ipv6 crc_ccitt
[ 19701.650330] WARN: ---[ end trace 79b40169d24b8e01 ]---
[ 19701.650333] WARN: RIP: e030:__xen_evtchn_do_upcall+0x82/0x90
[ 19701.650335] WARN: Code: 66 90 f6 c4 02 75 23 80 3b 00 75 d7 65 ff 05 85 89 ba 7e 48 8b 44 24 10 65 48 33 04 25 28 00 00 00 75 09 48 83 c4 18 5b 5d c3 <0f> 0b e8 77 aa bf ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 e9 66 ff
[ 19701.650341] WARN: RSP: e02b:ffff8881dc503fb8 EFLAGS: 00010002
[ 19701.650344] WARN: RAX: 0000000000000000 RBX: ffff8881dc514100 RCX: 000000008518dd93
[ 19701.650346] WARN: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881db6aa800
[ 19701.650349] WARN: RBP: 0000000000000004 R08: 00000000000035c6 R09: ffff8881db003210
[ 19701.650352] WARN: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 19701.650355] WARN: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 19701.650361] WARN: FS: 0000000000000000(0000) GS:ffff8881dc500000(0000) knlGS:0000000000000000
[ 19701.650364] WARN: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
[ 19701.650367] WARN: CR2: 00007fffc3789c78 CR3: 00000001d81ca000 CR4: 0000000000040660
[ 19701.650371] EMERG: Kernel panic - not syncing: Fatal exception in interrupt
Thank you for looking this over.