@olivierlambert Just produced another reboot. I'm closing in on the way to replicate this issue.
D
Posts
-
RE: Very scary host reboot issue
-
RE: Very scary host reboot issue
[ 334371.865769] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 334371.865787] INFO: PGD 2250ed067 P4D 2250ed067 PUD 228c9f067 PMD 0 [ 334371.865803] WARN: Oops: 0000 [#1] SMP NOPTI [ 334371.865810] WARN: CPU: 9 PID: 57 Comm: ksoftirqd/9 Tainted: G O 4.19.0+1 #1 [ 334371.865818] WARN: Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 2.9.0 12/06/2019 [ 334371.865832] WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0 [ 334371.865839] WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff [ 334371.865858] WARN: RSP: e02b:ffffc9004026b6f8 EFLAGS: 00010282 [ 334371.865864] WARN: RAX: ffff888099621ae0 RBX: 0000000000000000 RCX: 00000000000000c0 [ 334371.865873] WARN: RDX: ffff888099621ac0 RSI: ffff888099621ac0 RDI: ffffea00031da880 [ 334371.865881] WARN: RBP: 0000000000000000 R08: ffff888099621a00 R09: ffff8881f0d43e98 [ 334371.865890] WARN: R10: ffffc9004026b8b0 R11: 0000000000000000 R12: ffff888096e61c00 [ 334371.865898] WARN: R13: 0000000000000000 R14: ffff88822b867a80 R15: 0000000000000000 [ 334371.865918] WARN: FS: 0000000000000000(0000) GS:ffff88822d440000(0000) knlGS:0000000000000000 [ 334371.865927] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 334371.865935] WARN: CR2: 0000000000000008 CR3: 00000002281aa000 CR4: 0000000000040660 [ 334371.865949] WARN: Call Trace: [ 334371.865958] WARN: skb_clone+0x71/0xa0 [ 334371.865968] WARN: do_execute_actions+0x4ec/0x1750 [openvswitch] [ 334371.865978] WARN: ? ovs_dp_process_packet+0x7d/0x110 [openvswitch] [ 334371.865988] WARN: ? ovs_vport_receive+0x6e/0xd0 [openvswitch] [ 334371.865997] WARN: ? arch_local_irq_restore+0x5/0x10 [ 334371.866005] WARN: ? get_page_from_freelist+0xa4f/0xf00 [ 334371.866012] WARN: ? arch_local_irq_restore+0x5/0x10 [ 334371.866020] WARN: ? get_page_from_freelist+0xa4f/0xf00 [ 334371.866031] WARN: ovs_execute_actions+0x47/0x120 [openvswitch] [ 334371.866040] WARN: ovs_dp_process_packet+0x7d/0x110 [openvswitch] [ 334371.866050] WARN: ? key_extract+0xa53/0xd60 [openvswitch] [ 334371.866058] WARN: ovs_vport_receive+0x6e/0xd0 [openvswitch] [ 334371.866066] WARN: ? __alloc_skb+0x4e/0x270 [ 334371.866075] WARN: ? notify_remote_via_irq+0x4a/0x70 [ 334371.866085] WARN: ? __raw_callee_save_xen_vcpu_stolen+0x11/0x20 [ 334371.866091] WARN: ? __alloc_skb+0x76/0x270 [ 334371.866100] WARN: ? arch_local_irq_restore+0x5/0x10 [ 334371.866108] WARN: ? __slab_alloc.constprop.81+0x42/0x4e [ 334371.866114] WARN: ? __alloc_skb+0x4e/0x270 [ 334371.866120] WARN: ? __kmalloc_track_caller+0x58/0x200 [ 334371.866127] WARN: ? __slab_alloc.constprop.81+0x42/0x4e [ 334371.866136] WARN: ? __kmalloc_reserve.isra.48+0x29/0x70 [ 334371.866146] WARN: netdev_frame_hook+0x105/0x180 [openvswitch] [ 334371.866154] WARN: __netif_receive_skb_core+0x211/0xb30 [ 334371.866163] WARN: __netif_receive_skb_one_core+0x36/0x70 [ 334371.866170] WARN: netif_receive_skb_internal+0x34/0xe0 [ 334371.866179] WARN: xenvif_tx_action+0x55c/0x990 [ 334371.866187] WARN: xenvif_poll+0x27/0x70 [ 334371.866193] WARN: net_rx_action+0x2a5/0x3e0 [ 334371.866200] WARN: __do_softirq+0xd1/0x28c [ 334371.866208] WARN: run_ksoftirqd+0x26/0x40 [ 334371.866215] WARN: smpboot_thread_fn+0x10e/0x160 [ 334371.866223] WARN: kthread+0xf8/0x130 [ 334371.866229] WARN: ? sort_range+0x20/0x20 [ 334371.866235] WARN: ? kthread_bind+0x10/0x10 [ 334371.866242] WARN: ret_from_fork+0x35/0x40 [ 334371.866250] WARN: Modules linked in: tun bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat 8021q garp mrp stp llc dm_multipath ipt_REJECT nf_reject_ipv4 xt_tcpu$ [ 334371.866374] WARN: scsi_mod efivarfs ipv6 crc_ccitt [ 334371.866384] WARN: CR2: 0000000000000008 [ 334371.866396] WARN: ---[ end trace 8b74661a79be8268 ]--- [ 334371.868712] WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0 [ 334371.868721] WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff [ 334371.868740] WARN: RSP: e02b:ffffc9004026b6f8 EFLAGS: 00010282 [ 334371.868748] WARN: RAX: ffff888099621ae0 RBX: 0000000000000000 RCX: 00000000000000c0 [ 334371.868759] WARN: RDX: ffff888099621ac0 RSI: ffff888099621ac0 RDI: ffffea00031da880 [ 334371.868769] WARN: RBP: 0000000000000000 R08: ffff888099621a00 R09: ffff8881f0d43e98 [ 334371.868778] WARN: R10: ffffc9004026b8b0 R11: 0000000000000000 R12: ffff888096e61c00 [ 334371.868788] WARN: R13: 0000000000000000 R14: ffff88822b867a80 R15: 0000000000000000 [ 334371.868805] WARN: FS: 0000000000000000(0000) GS:ffff88822d440000(0000) knlGS:0000000000000000 [ 334371.868815] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 334371.868823] WARN: CR2: 0000000000000008 CR3: 00000002281aa000 CR4: 0000000000040660 [ 334371.868837] EMERG: Kernel panic - not syncing: Fatal exception in interrupt
-
RE: Xen Orchestra cannot connect to XCP-ng Host
I found the problem.
I am using OPNsense and forgot to disable TX checksum offloading. Very interesting that this checksum offloading caused catastrophic network disruptions on a Realtek nic, but no noticeable performance hit on Intel nics. This was an old host that featured a Realtek card. All my recent hosts that I use have only Intel nics. That is why I forgot about the whole offloading thing.Thanks for the tips.
Best wishes to the whole community!