XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Kernel panic on fresh install

    Scheduled Pinned Locked Moved Compute
    34 Posts 5 Posters 5.1k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      I mean the physical NIC in your host. "NIC type" doesn't matter as soon your OS booted, it will switch to Xen PV NICs.

      Are you using Wireguard and/or any VPN that might use a different MTU than 1500?

      J 1 Reply Last reply Reply Quote 0
      • J Offline
        JEDIBC @olivierlambert
        last edited by

        @olivierlambert Here's the NIb brand/model : Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe

        Yes we use wireguard but the MTU is not specified so it should be 1500 :
        1826b7b2-71e3-439c-9c57-ae98a5afa3e9-image.png

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          I think packets inside WG are using a smaller MTU because they must contain VPN keys and such.

          You might wonder how I guessed that you are using BSD VMs + VPNs: in fact, you are not the first to report the problem. We even had it at some point here (but we never managed to reproduce).

          What we know:

          • it's an OVS bug
          • it's happening when you use BSD like VMs
          • and also probably VPNs

          So we suspect a packet that OVS can't decode without exploding. However, it's hard to move forward without being able to reproduce it so we can find exactly what packets is doing this 😞

          J 1 Reply Last reply Reply Quote 0
          • J Offline
            JEDIBC @olivierlambert
            last edited by

            @olivierlambert will my complete crash dump directory would help you ?

            Another thing that's annoying me is that the VMs on the host don't start even with the autostart option checked.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              For autostart: just disable/re-enable it in XO, that will do the trick.

              The crash dump is sadly not enough to know what packet actually cause OVS to crash 😞

              J 1 Reply Last reply Reply Quote 0
              • J Offline
                JEDIBC @olivierlambert
                last edited by

                @olivierlambert Is it because some packets size from wireguard are too high ? Should I lower the MTU of the wireguard servers & clients to be lower than the MTU of the host interface ?

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Sadly, I don't know exactly what's causing it. Ideally, if you can find a way to trigger it on purpose, that would be wonderful.

                  1 Reply Last reply Reply Quote 0
                  • S Offline
                    sasha
                    last edited by sasha

                    Same here. XCP-ng started to crash unexpectedly in last 3 month with no obvious reason with similar crash log:

                    [ 214278.799922]  ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
                    [ 214278.799944]   INFO: PGD 0 P4D 0
                    [ 214278.799956]   WARN: Oops: 0000 [#1] SMP NOPTI
                    [ 214278.799967]   WARN: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           O      4.19.0+1 #1
                    [ 214278.799976]   WARN: Hardware name: Quanta Cloud Technology Inc. QuantaPlex T22HF-1U/S5HF MB, BIOS 3A05.ON02 03/20/2019
                    [ 214278.799994]   WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0
                    [ 214278.800001]   WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff
                    [ 214278.800017]   WARN: RSP: e02b:ffff888235303668 EFLAGS: 00010282
                    [ 214278.800025]   WARN: RAX: ffff8880a540dce0 RBX: 0000000000000000 RCX: 00000000000000c0
                    [ 214278.800033]   WARN: RDX: ffff8880a540dcc0 RSI: ffff8880a540dcc0 RDI: ffffea0008ff6380
                    [ 214278.800042]   WARN: RBP: 0000000000000000 R08: ffff8880a540dc00 R09: 0000000000000001
                    [ 214278.800050]   WARN: R10: 0000000000000259 R11: ffff88812597f540 R12: ffff888042ae1d00
                    [ 214278.800057]   WARN: R13: 0000000000000000 R14: ffff888122bec8c0 R15: 0000000000000000
                    [ 214278.800079]   WARN: FS:  0000000000000000(0000) GS:ffff888235300000(0000) knlGS:0000000000000000
                    [ 214278.800088]   WARN: CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
                    [ 214278.800095]   WARN: CR2: 0000000000000008 CR3: 00000001f049e000 CR4: 0000000000040660
                    [ 214278.800105]   WARN: Call Trace:
                    [ 214278.800113]   WARN:  <IRQ>
                    [ 214278.800129]   WARN:  tun_net_xmit+0x3de/0x460 [tun]
                    [ 214278.800140]   WARN:  dev_hard_start_xmit+0xa4/0x210
                    [ 214278.800151]   WARN:  sch_direct_xmit+0x10d/0x350
                    [ 214278.800159]   WARN:  __qdisc_run+0x167/0x4e0
                    [ 214278.800167]   WARN:  ? pfifo_fast_enqueue+0x92/0xf0
                    [ 214278.800176]   WARN:  __dev_queue_xmit+0x511/0x900
                    [ 214278.800189]   WARN:  do_execute_actions+0x157f/0x1750 [openvswitch]
                    [ 214278.800203]   WARN:  ? __wake_up_common_lock+0x87/0xc0
                    [ 214278.800214]   WARN:  ? __raw_callee_save_xen_vcpu_stolen+0x11/0x20
                    [ 214278.800226]   WARN:  ? __radix_tree_lookup+0x80/0xf0
                    [ 214278.800237]   WARN:  ovs_execute_actions+0x47/0x120 [openvswitch]
                    [ 214278.800249]   WARN:  ovs_dp_process_packet+0x7d/0x110 [openvswitch]
                    [ 214278.800261]   WARN:  ? key_extract+0xa53/0xd60 [openvswitch]
                    [ 214278.800274]   WARN:  ovs_vport_receive+0x6e/0xd0 [openvswitch]
                    [ 214278.800285]   WARN:  ? hrtimer_init+0x190/0x190
                    [ 214278.800294]   WARN:  ? xen_vcpuop_set_next_event+0x69/0xa0
                    [ 214278.800302]   WARN:  ? __alloc_skb+0x76/0x270
                    [ 214278.800312]   WARN:  ? arch_local_irq_restore+0x5/0x10
                    [ 214278.800320]   WARN:  ? __slab_alloc.constprop.81+0x42/0x4e
                    [ 214278.800327]   WARN:  ? __alloc_skb+0x76/0x270
                    [ 214278.800334]   WARN:  ? __kmalloc_track_caller+0x195/0x200
                    [ 214278.800343]   WARN:  ? __kmalloc_reserve.isra.48+0x29/0x70
                    [ 214278.800357]   WARN:  netdev_frame_hook+0x105/0x180 [openvswitch]
                    [ 214278.800367]   WARN:  __netif_receive_skb_core+0x211/0xb30
                    [ 214278.800377]   WARN:  __netif_receive_skb_one_core+0x36/0x70
                    [ 214278.800385]   WARN:  netif_receive_skb_internal+0x34/0xe0
                    [ 214278.800396]   WARN:  xenvif_tx_action+0x4b8/0x900
                    [ 214278.800406]   WARN:  xenvif_poll+0x27/0x70
                    [ 214278.800416]   WARN:  net_rx_action+0x2a5/0x3e0
                    [ 214278.800427]   WARN:  __do_softirq+0xd1/0x28c
                    [ 214278.800438]   WARN:  irq_exit+0xa8/0xc0
                    [ 214278.800448]   WARN:  xen_evtchn_do_upcall+0x2c/0x50
                    [ 214278.800459]   WARN:  xen_do_hypervisor_callback+0x29/0x40
                    [ 214278.800468]   WARN:  </IRQ>
                    [ 214278.800477]   WARN: RIP: e030:xen_hypercall_sched_op+0xa/0x20
                    [ 214278.800485]   WARN: Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
                    [ 214278.800502]   WARN: RSP: e02b:ffffc900400b3eb0 EFLAGS: 00000246
                    [ 214278.800510]   WARN: RAX: 0000000000000000 RBX: ffff88822c239d00 RCX: ffffffff810013aa
                    [ 214278.800519]   WARN: RDX: ffffffff8203d250 RSI: 0000000000000000 RDI: 0000000000000001
                    [ 214278.800528]   WARN: RBP: 0000000000000004 R08: 000000000001ca00 R09: 0000000000000000
                    [ 214278.800537]   WARN: R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
                    [ 214278.800545]   WARN: R13: 0000000000000000 R14: ffff88822c239d00 R15: ffff88822c239d00
                    [ 214278.800557]   WARN:  ? xen_hypercall_sched_op+0xa/0x20
                    [ 214278.800567]   WARN:  ? xen_safe_halt+0xc/0x20
                    [ 214278.800576]   WARN:  ? default_idle+0x1a/0x140
                    [ 214278.800585]   WARN:  ? do_idle+0x1ea/0x260
                    [ 214278.800594]   WARN:  ? cpu_startup_entry+0x6f/0x80
                    [ 214278.800602]   WARN: Modules linked in: tun rpcsec_gss_krb5 nfsv4 nfs fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc 8021q garp mrp stp llc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat dm_multipath i
                    pt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter nls_iso8859_1 nls_cp437 vfat fat raid0 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel md_mod pcbc aesni_in
                    tel dm_mod aes_x86_64 crypto_simd cryptd glue_helper i2c_piix4 k10temp ipmi_si ipmi_devintf ipmi_msghandler nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables ahci libahci nvme xhci_pci libata nvme_core xhci_hcd i
                    xgbe(O) scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod efivarfs ipv6 crc_ccitt
                    [ 214278.800746]   WARN: CR2: 0000000000000008
                    [ 214278.800768]   WARN: ---[ end trace 0f1c8a4f455bc1b3 ]---
                    [ 214280.803918]   WARN: RIP: e030:skb_copy_ubufs+0x19c/0x5f0
                    [ 214280.803947]   WARN: Code: 90 cc 00 00 00 48 03 90 d0 00 00 00 48 63 44 24 40 48 83 c0 03 48 c1 e0 04 48 01 d0 48 89 18 c7 40 08 00 00 00 00 44 89 78 0c <48> 8b 43 08 a8 01 0f 85 3f 04 00 00 48 8b 44 24 30 48 83 78 20 ff
                    [ 214280.803971]   WARN: RSP: e02b:ffff888235303668 EFLAGS: 00010282
                    [ 214280.803977]   WARN: RAX: ffff8880a540dce0 RBX: 0000000000000000 RCX: 00000000000000c0
                    [ 214280.803982]   WARN: RDX: ffff8880a540dcc0 RSI: ffff8880a540dcc0 RDI: ffffea0008ff6380
                    [ 214280.803987]   WARN: RBP: 0000000000000000 R08: ffff8880a540dc00 R09: 0000000000000001
                    [ 214280.803992]   WARN: R10: 0000000000000259 R11: ffff88812597f540 R12: ffff888042ae1d00
                    [ 214280.804003]   WARN: R13: 0000000000000000 R14: ffff888122bec8c0 R15: 0000000000000000
                    [ 214280.804018]   WARN: FS:  0000000000000000(0000) GS:ffff888235300000(0000) knlGS:0000000000000000
                    [ 214280.804027]   WARN: CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
                    [ 214280.804032]   WARN: CR2: 0000000000000008 CR3: 00000001f049e000 CR4: 0000000000040660
                    [ 214280.804040]  EMERG: Kernel panic - not syncing: Fatal exception in interrupt
                    

                    In my setup I run opnsense (yes, FreeBSD based) on top for firewall/VPN (WG, OpenVPN). I'll check for MTU and reply here.

                    S 1 Reply Last reply Reply Quote 0
                    • S Offline
                      sasha @sasha
                      last edited by sasha

                      lspci on host:

                      01:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)
                      01:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)
                      

                      MTU on OpnSense:

                      LAN interface (lan, xn1)	Status	up 
                      MTU	1500
                      
                      RoadWarriorWG0 interface (opt1, wg0) Status	up 
                      MTU	16304
                      
                      WAN interface (wan, xn0) Status	up 
                      MTU	1500
                      
                      site2siteWG1 interface (opt2, wg1)Status	up 
                      MTU	1420
                      
                      xcp interface (opt3, xn2) Status	up 
                      MTU	1500
                      
                      Unassigned interface (lo0) 
                      MTU	16384
                      
                      Unassigned interface (enc0) Status	down 
                      MTU	1536
                      
                      Unassigned interface (pflog0) Status	down 
                      MTU	33160
                      
                      Unassigned interface (ovpns1) Status	up 
                      MTU	1500
                      
                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by

                        We just released new patches that might solve this. Please update and keep us posted next time you have the problem 🙂

                        S J 3 Replies Last reply Reply Quote 1
                        • S Offline
                          sasha @olivierlambert
                          last edited by

                          @olivierlambert
                          Thank you for quick fix!
                          I'll leave it w/o patch for couple of days to see how it works with reduced MTU on WireGuard interface before applying patch. Just in case MTU will do the trick.

                          1 Reply Last reply Reply Quote 0
                          • J Offline
                            JEDIBC @olivierlambert
                            last edited by

                            @olivierlambert Thanks ! Will try it as soon as I'm off vacation in early september.

                            1 Reply Last reply Reply Quote 0
                            • S Offline
                              sasha @olivierlambert
                              last edited by sasha

                              @olivierlambert Server just rebooted with same error. Updates installed and waiting for next reboot 👿

                              Log file almost identical:
                              Main differences between previous 2 days ago:
                              old

                              code_textWARN: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           O      4.19.0+1 #1
                              

                              new

                              code_textWARN: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O      4.19.0+1 #1
                              
                              

                              3a4db225-1ae4-4aa8-9d79-2a576e5efcca-image.png

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Offline
                                olivierlambert Vates 🪐 Co-Founder CEO
                                last edited by

                                So you had the same issue after all the last updates + a "manual" reboot?

                                S 1 Reply Last reply Reply Quote 0
                                • S Offline
                                  sasha @olivierlambert
                                  last edited by

                                  @olivierlambert said in Kernel panic on fresh install:

                                  So you had the same issue after all the last updates + a "manual" reboot?

                                  No reboot after updates. Just a mention, that in my case reducing MTU on heavy-utilised Wireguard interface didn't help.

                                  Also these reboots completely unpredictable, sometimes during busy day, but more often during night hours where only backups can run.

                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates 🪐 Co-Founder CEO
                                    last edited by

                                    Okay so now you have the updates really installed, we'll see if it happens 🙂

                                    S 2 Replies Last reply Reply Quote 1
                                    • S Offline
                                      sasha @olivierlambert
                                      last edited by

                                      @olivierlambert said in Kernel panic on fresh install:

                                      Okay so now you have the updates really installed, we'll see if it happens 🙂

                                      Just got crash reboot again, while trying to restart a VM from XCP-center from another VM in the pool. This reboot should apply all patches from last updates.

                                      1 Reply Last reply Reply Quote 1
                                      • S Offline
                                        sasha @olivierlambert
                                        last edited by sasha

                                        @olivierlambert

                                        Just had two consecutive crashes 15 minutes apart.
                                        This is comparison between old crash before updates and after latest updates.
                                        f19d588d-7201-46f2-badb-b51945b13f4e-image.png

                                        S 1 Reply Last reply Reply Quote 1
                                        • S Offline
                                          sasha @sasha
                                          last edited by

                                          New crash, same message...

                                          1 Reply Last reply Reply Quote 0
                                          • olivierlambertO Offline
                                            olivierlambert Vates 🪐 Co-Founder CEO
                                            last edited by

                                            It's weird, OVS is not involved. So it might be something else 🤔

                                            Any chance you know how to trigger it artificially? That would be really helpful to pinpoint the issue.

                                            S 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post