Kernel panic on fresh install

JEDIBC

@olivierlambert Thanks ! Will try it as soon as I'm off vacation in early september.

sasha

@olivierlambert Server just rebooted with same error. Updates installed and waiting for next reboot

Log file almost identical:
Main differences between previous 2 days ago:
old

code_textWARN: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           O      4.19.0+1 #1

new

code_textWARN: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O      4.19.0+1 #1

olivierlambert

So you had the same issue after all the last updates + a "manual" reboot?

sasha

@olivierlambert said in Kernel panic on fresh install:

So you had the same issue after all the last updates + a "manual" reboot?

No reboot after updates. Just a mention, that in my case reducing MTU on heavy-utilised Wireguard interface didn't help.

Also these reboots completely unpredictable, sometimes during busy day, but more often during night hours where only backups can run.

olivierlambert

Okay so now you have the updates really installed, we'll see if it happens

sasha

@olivierlambert said in Kernel panic on fresh install:

Okay so now you have the updates really installed, we'll see if it happens

Just got crash reboot again, while trying to restart a VM from XCP-center from another VM in the pool. This reboot should apply all patches from last updates.

sasha

@olivierlambert

Just had two consecutive crashes 15 minutes apart.
This is comparison between old crash before updates and after latest updates.

sasha

New crash, same message...

olivierlambert

It's weird, OVS is not involved. So it might be something else

Any chance you know how to trigger it artificially? That would be really helpful to pinpoint the issue.

sasha

@olivierlambert
I would love to! Only thing I can say - is when I was using windows XCP-console from VM inside the pool, once starting VM caused whole system to crash with this error, another day changing VAPP config also crashed server. That is why I am trying to avoid using xcp-console during business time. I'll try to reproduce it once again and reply.

tuxen

@sasha It's worth notice that the BIOS (from 2019) is relatively old/outdated. It's recommended to update the BIOS to a more recent version.

sasha

@tuxen said in Kernel panic on fresh install:

@sasha It's worth notice that the BIOS (from 2019) is relatively old/outdated. It's recommended to update the BIOS to a more recent version.

Thank you for pointing out! I'll try to reach support team for this.

sasha

Got similar crash today on new hardware

WARN: Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.18.1 02/22/2023
...
WARN: CR2: 0000000000000008 CR3: 000000000384a000 CR4: 0000000000040660
[ 484188.711152]   WARN: Call Trace:
[ 484188.711163]   WARN:  <IRQ>
[ 484188.711178]   WARN:  ? _raw_spin_unlock_irqrestore+0x14/0x20
[ 484188.711203]   WARN:  tun_net_xmit+0x3de/0x460 [tun]
[ 484188.711223]   WARN:  dev_hard_start_xmit+0xa4/0x210
[ 484188.711242]   WARN:  sch_direct_xmit+0x10d/0x350
[ 484188.711256]   WARN:  __qdisc_run+0x167/0x4e0
[ 484188.711269]   WARN:  ? pfifo_fast_enqueue+0x92/0xf0
[ 484188.711284]   WARN:  __dev_queue_xmit+0x511/0x900
[ 484188.711300]   WARN:  ? skb_copy_ubufs+0x5b0/0x5f0
[ 484188.711334]   WARN:  do_execute_actions+0x157f/0x1750 [openvswitch]
[ 484188.711369]   WARN:  ? __radix_tree_lookup+0x80/0xf0
[ 484188.711394]   WARN:  ? notify_remote_via_irq+0x4a/0x70
[ 484188.711417]   WARN:  ? check_preempt_curr+0x6b/0x90
[ 484188.711437]   WARN:  ? ttwu_do_wakeup+0x19/0x140
[ 484188.711456]   WARN:  ? _raw_spin_unlock_irqrestore+0x14/0x20
[ 484188.711478]   WARN:  ? try_to_wake_up+0x54/0x450
[ 484188.711503]   WARN:  ? __raw_callee_save_xen_vcpu_stolen+0x11/0x20
[ 484188.711530]   WARN:  ? trigger_load_balance+0x54/0x170
[ 484188.711549]   WARN:  ovs_execute_actions+0x47/0x120 [openvswitch]
[ 484188.711578]   WARN:  ovs_dp_process_packet+0x7d/0x110 [openvswitch]
[ 484188.711609]   WARN:  ? key_extract+0xa53/0xd60 [openvswitch]
[ 484188.711638]   WARN:  ovs_vport_receive+0x6e/0xd0 [openvswitch]
[ 484188.711654]   WARN:  ? __alloc_skb+0x4e/0x270
[ 484188.711667]   WARN:  ? __alloc_skb+0x76/0x270
[ 484188.711684]   WARN:  ? arch_local_irq_restore+0x5/0x10
[ 484188.711700]   WARN:  ? __slab_alloc.constprop.81+0x42/0x4e
[ 484188.711715]   WARN:  ? __alloc_skb+0x76/0x270
[ 484188.711727]   WARN:  ? __kmalloc_track_caller+0x58/0x200
[ 484188.711747]   WARN:  ? __kmalloc_reserve.isra.48+0x29/0x70
[ 484188.711768]   WARN:  netdev_frame_hook+0x105/0x180 [openvswitch]
[ 484188.711785]   WARN:  __netif_receive_skb_core+0x211/0xb30
[ 484188.711802]   WARN:  __netif_receive_skb_one_core+0x36/0x70
[ 484188.711818]   WARN:  netif_receive_skb_internal+0x34/0xe0
[ 484188.711838]   WARN:  xenvif_tx_action+0x55c/0x990
[ 484188.711853]   WARN:  xenvif_poll+0x27/0x70
[ 484188.711867]   WARN:  net_rx_action+0x2a5/0x3e0
[ 484188.711882]   WARN:  __do_softirq+0xd1/0x28c
[ 484188.711899]   WARN:  irq_exit+0xa8/0xc0
[ 484188.711913]   WARN:  xen_evtchn_do_upcall+0x2c/0x50
[ 484188.711930]   WARN:  xen_do_hypervisor_callback+0x29/0x40
[ 484188.711949]   WARN:  </IRQ>
[ 484188.711968]   WARN: RIP: e030:xen_hypercall_sched_op+0xa/0x20
[ 484188.711999]   WARN: Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[ 484188.712051]   WARN: RSP: e02b:ffffc900401e3eb0 EFLAGS: 00000246
[ 484188.712074]   WARN: RAX: 0000000000000000 RBX: ffff8882a5e43a00 RCX: ffffffff810013aa
[ 484188.712101]   WARN: RDX: ffffffff8203d250 RSI: 0000000000000000 RDI: 0000000000000001
[ 484188.712127]   WARN: RBP: 000000000000000a R08: 00000000e94460e8 R09: 0000000000000000
[ 484188.712155]   WARN: R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
[ 484188.712176]   WARN: R13: 0000000000000000 R14: ffff8882a5e43a00 R15: ffff8882a5e43a00
[ 484188.712200]   WARN:  ? xen_hypercall_sched_op+0xa/0x20
[ 484188.712218]   WARN:  ? xen_safe_halt+0xc/0x20
[ 484188.712232]   WARN:  ? default_idle+0x1a/0x140
[ 484188.712245]   WARN:  ? do_idle+0x1ea/0x260
[ 484188.712259]   WARN:  ? cpu_startup_entry+0x6f/0x80
[ 484188.712271]   WARN: Modules linked in: tun hid_generic usbhid hid bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat 8021q garp mrp stp llc dm_multipath ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter skx_edac intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper nls_iso8859_1 nls_cp437 vfat fat dcdbas i2c_i801 lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables x_tables raid1 raid0 md_mod nvme ahci nvme_core libahci xhci_pci ixgbe(O) igb(O) libata xhci_hcd dm_mod scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod efivarfs ipv6 crc_ccitt
[ 484188.712495]   WARN: CR2: 0000000000000008
[ 484188.712517]   WARN: ---[ end trace 0bd4f18732c111b7 ]---
[ 484188.752460]   WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0
[ 484188.752485]   WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff
[ 484188.752513]   WARN: RSP: e02b:ffff8882a7083668 EFLAGS: 00010282
[ 484188.752525]   WARN: RAX: ffff88822933c2e0 RBX: 0000000000000000 RCX: 00000000000000c0
[ 484188.752537]   WARN: RDX: ffff88822933c2c0 RSI: ffff88822933c2c0 RDI: ffffea000abec0c0
[ 484188.752548]   WARN: RBP: 0000000000000000 R08: ffff88822933c200 R09: 0000000000000001
[ 484188.752559]   WARN: R10: 0000000000000320 R11: ffff88829d3ded40 R12: ffff8881d9b6ef00
[ 484188.752571]   WARN: R13: 0000000000000000 R14: ffff8882395988c0 R15: 0000000000000000
[ 484188.752594]   WARN: FS:  0000000000000000(0000) GS:ffff8882a7080000(0000) knlGS:0000000000000000
[ 484188.752606]   WARN: CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
[ 484188.752614]   WARN: CR2: 0000000000000008 CR3: 000000000384a000 CR4: 0000000000040660
[ 484188.752631]  EMERG: Kernel panic - not syncing: Fatal exception in interrupt

Danp

@sasha BIOS is still outdated on your new hardware.

sasha

@Danp said in Kernel panic on fresh install:

@sasha BIOS is still outdated on your new hardware.

Agree. But I can't do anything about it

sasha

A little update on this. I deactivated the OpenVPN server on OpnSense in December and since then there have been no reboots or kernel panics.

olivierlambert

Hi!

It's likely due to a tricky problem that was finally identified. A security patch should come soon For some reason, FreeBSD is the most likely to trigger it, especially when you do some kind of VPN-related software.

olivierlambert

The security issue (XSA) is now publicly accessible: https://xenbits.xenproject.org/xsa/advisory-448.html

Transmit requests in Xen's virtual network protocol can consist of
multiple parts. While not really useful, except for the initial part
any of them may be of zero length, i.e. carry no data at all. Besides a
certain initial portion of the to be transferred data, these parts are
directly translated into what Linux calls SKB fragments. Such converted
request parts can, when for a particular SKB they are all of length
zero, lead to a de-reference of NULL in core networking code.

If you want, we can make you a test update first so you can try it and see if you are now protected despite getting your OpnSense with OpenVPN up and running.

sasha

@olivierlambert these events helped me finally stop supporting OpenVPN and I'm not going back Wireguard works much better for my setup. Thank you!

olivierlambert

No problem FYI the update is now available: https://xcp-ng.org/forum/post/70029