XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XcpNG - Xen kernel crash (FATAL TRAP: vector = 2 (nmi))

    Scheduled Pinned Locked Moved Compute
    19 Posts 6 Posters 3.3k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Offline
      petr.bena
      last edited by

      Hello,

      One of my hosts started crashing recently. It's a SuperMicro X10SLL-F with Xeon E3 1220-v3 that for past 5 years was running XenServer 6.5 without any crashes.

      During my upgrade from XenServer to XCP-ng I took it out, added some new disks and reinstalled to XCP-NG 8.0, it was running fine for many days but recently started crashing, there were some issues with kdump that I managed to fix yesterday and now I finally have some crashdumps. In xen.log I can see this exception:

      (XEN) [   57.621481] NMI - PCI system error (SERR)
      (XEN) [   57.621483] ----[ Xen-4.11.1-7.5.1.xcpng8.0  x86_64  debug=n   Not tainted ]----
      (XEN) [   57.621484] CPU:    0
      (XEN) [   57.621485] RIP:    e008:[<ffff82d08027cce2>] do_IRQ+0x2/0x700
      (XEN) [   57.621498] RFLAGS: 0000000000000046   CONTEXT: hypervisor
      (XEN) [   57.621500] rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
      (XEN) [   57.621501] rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff8300dd067d98
      (XEN) [   57.621501] rbp: 0000000000000000   rsp: ffff8300dd067d88   r8:  0000000000000000
      (XEN) [   57.621502] r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
      (XEN) [   57.621503] r12: 0000000000000000   r13: 0000000000000000   r14: ffff8300dd067fff
      (XEN) [   57.621504] r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000162660
      (XEN) [   57.621504] cr3: 000000057c98e000   cr2: ffff88807f3abcf8
      (XEN) [   57.621505] fsb: 0000000000000000   gsb: ffff8880a3800000   gss: 0000000000000000
      (XEN) [   57.621516] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
      (XEN) [   57.621518] Xen code around <ffff82d08027cce2> (do_IRQ+0x2/0x700):
      (XEN) [   57.621518]  84 00 00 00 00 00 41 57 <48> 8d 05 f7 e2 33 00 49 89 f8 41 56 4c 8d 35 d3
      (XEN) [   57.621521] Xen stack trace from rsp=ffff8300dd067d88:
      (XEN) [   57.621521]    0000000000000000 ffff82d08035b8a6 ffff83081cb86d38 ffff82d080592b20
      (XEN) [   57.621523]    ffff82d0805baa50 ffff82d080573d00 ffff83081cb86cc0 0000000d6a7ffae3
      (XEN) [   57.621524]    ffff83081cb71568 0000000000000006 0000000dabaf623d 0000000d61585e7a
      (XEN) [   57.621525]    0000000000000002 0000000d6a7f1baa ffff8300dd067fff 0000000d6a7ffae3
      (XEN) [   57.621526]    ffff83081cb86cf0 0000004900000000 ffff82d0802ccad3 000000000000e008
      (XEN) [   57.621528]    0000000000000202 ffff8300dd067e40 000000000000e010 0000000d6a7f7ffb
      (XEN) [   57.621529]    0000000100000002 0000006400000ac9 0000000000000000 0000000000000000
      (XEN) [   57.621530]    ffff82d08035b43e ffff8300dd6d0000 ffffffffffffffff ffff82d08035b400
      (XEN) [   57.621531]    ffff82d080573d00 ffff82d0805baa50 0000000000000000 0000000000000000
      (XEN) [   57.621532]    ffff82d080592b20 ffff8300dd067fff ffff82d08026e715 ffff8300dd6d0000
      (XEN) [   57.621534]    ffff8300dd6d0000 ffff8300dd940000 ffff83081cb9d000 00000000ffffffff
      (XEN) [   57.621535]    ffff83081cb91000 ffff82d080592b20 ffffffff82011740 ffffffff82011740
      (XEN) [   57.621536]    0000000000000000 0000000000000000 0000000000000000 ffffffff82011740
      (XEN) [   57.621537]    0000000000000246 0000000000007ff0 0000000000000000 00000000a5293488
      (XEN) [   57.621538]    0000000000000000 ffffffff810013aa ffffffff8203c190 0000000000000000
      (XEN) [   57.621539]    0000000000000001 0000010000000000 ffffffff810013aa 000000000000e033
      (XEN) [   57.621540]    0000000000000246 ffffffff82003e58 000000000000e02b dcfb7c2bdd067fe0
      (XEN) [   57.621541]    dcfb7cae00097f75 dcfb7da200000000 dcfb7951dd067fe0 0000e01000000000
      (XEN) [   57.621543]    ffff8300dd6d0000 0000000000000000 0000000000162660 0000000000000000
      (XEN) [   57.621544]    800000081cb9b002 0000060000000000 dcfb883e00097f00
      (XEN) [   57.621545] Xen call trace:
      (XEN) [   57.621546]    [<ffff82d08027cce2>] do_IRQ+0x2/0x700
      (XEN) [   57.621548]    [<ffff82d08035b8a6>] common_interrupt+0x106/0x120
      (XEN) [   57.621551]    [<ffff82d0802ccad3>] mwait-idle.c#mwait_idle+0x243/0x3d0
      (XEN) [   57.621552]    [<ffff82d08035b43e>] lstar_enter+0xae/0x120
      (XEN) [   57.621553]    [<ffff82d08035b400>] lstar_enter+0x70/0x120
      (XEN) [   57.621556]    [<ffff82d08026e715>] domain.c#idle_loop+0x85/0xb0
      (XEN) [   57.621556]
      (XEN) [   57.621557]
      (XEN) [   57.621558] ****************************************
      (XEN) [   57.621558] Panic on CPU 0:
      (XEN) [   57.621559] FATAL TRAP: vector = 2 (nmi)
      (XEN) [   57.621559] [error_code=0000] , IN INTERRUPT CONTEXT
      (XEN) [   57.621559] ****************************************
      (XEN) [   57.621560]
      (XEN) [   57.621560] Reboot in five seconds...
      (XEN) [   57.621561] Executing kexec image on cpu0
      (XEN) [   57.622562] Shot down all CPUs
      

      Any idea what it could be? How could I investigate / check this? It's a part of 3 node HA setup, nothing is running there, rebooting / installing diagnostic tools is no problem.

      Thank you

      1 Reply Last reply Reply Quote 0
      • R Offline
        r1 XCP-ng Team
        last edited by

        You may need to try nmi=ignore. Refer /boot/grub/grub.conf.

        Will have to take a deeper look on which interrupt is triggering and why.

        1 Reply Last reply Reply Quote 0
        • P Offline
          petr.bena
          last edited by

          I updated firmware of motherboard and so far it didn't crash, let's see

          1 Reply Last reply Reply Quote 0
          • P Offline
            petr.bena
            last edited by

            So, it just crashed again, so update didn't help... this really looks like software issue to me

            1 Reply Last reply Reply Quote 0
            • P Offline
              petr.bena
              last edited by

              I changed boot line for xen kernel to nmi=dom0 I will see if it helps, I am not really sure what is going on here but it looks similar to this issue https://bugs.xenserver.org/browse/XSO-195?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&showAll=true

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Online
                olivierlambert Vates ๐Ÿช Co-Founder CEO
                last edited by

                Keep us posted ๐Ÿ™‚

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Online
                  olivierlambert Vates ๐Ÿช Co-Founder CEO
                  last edited by

                  @petr-bena said in XcpNG - Xen kernel crash (FATAL TRAP: vector = 2 (nmi)):

                  NMI - PCI system error (SERR)

                  This line seems to be the problem. The hardware reported a catastrophic failure on PCI bus. It might come from the new disk you plugged.

                  1 Reply Last reply Reply Quote 0
                  • P Offline
                    petr.bena
                    last edited by

                    I don't know, I added nmi=dom0 to xen boot parameter line and so far it didn't crash, I've seen some weird things in dmesg on dom0 which I don't know if are related, but it didn't crash. So far it's working. I would rather think that this is somehow connected to C-states or something similar, this is old motherboard from around 2015.

                    [Thu Oct  3 16:10:14 2019] block tdd: sector-size: 512/512 capacity: 67108864
                    [Thu Oct  3 16:11:05 2019] swapper/0: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
                    [Thu Oct  3 16:11:05 2019] swapper/0 cpuset=/ mems_allowed=0
                    [Thu Oct  3 16:11:05 2019] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      4.19.0+1 #1
                    [Thu Oct  3 16:11:05 2019] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.2 05/14/2018
                    [Thu Oct  3 16:11:05 2019] Call Trace:
                    [Thu Oct  3 16:11:05 2019]  <IRQ>
                    [Thu Oct  3 16:11:05 2019]  dump_stack+0x5a/0x73
                    [Thu Oct  3 16:11:05 2019]  warn_alloc+0xee/0x180
                    [Thu Oct  3 16:11:05 2019]  __alloc_pages_slowpath+0x84d/0xa09
                    [Thu Oct  3 16:11:05 2019]  ? get_page_from_freelist+0x14c/0xf00
                    [Thu Oct  3 16:11:05 2019]  __alloc_pages_nodemask+0x271/0x2b0
                    [Thu Oct  3 16:11:05 2019]  page_frag_alloc+0x103/0x120
                    [Thu Oct  3 16:11:05 2019]  __napi_alloc_skb+0x82/0xd0
                    [Thu Oct  3 16:11:05 2019]  rtl8169_poll+0x249/0x640 [r8169]
                    [Thu Oct  3 16:11:05 2019]  net_rx_action+0x2a5/0x3e0
                    [Thu Oct  3 16:11:05 2019]  __do_softirq+0xd1/0x28c
                    [Thu Oct  3 16:11:05 2019]  irq_exit+0xa8/0xc0
                    [Thu Oct  3 16:11:05 2019]  xen_evtchn_do_upcall+0x2c/0x50
                    [Thu Oct  3 16:11:05 2019]  xen_do_hypervisor_callback+0x29/0x40
                    [Thu Oct  3 16:11:05 2019]  </IRQ>
                    [Thu Oct  3 16:11:05 2019] RIP: e030:xen_hypercall_sched_op+0xa/0x20
                    [Thu Oct  3 16:11:05 2019] Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
                    [Thu Oct  3 16:11:05 2019] RSP: e02b:ffffffff82003e58 EFLAGS: 00000246
                    [Thu Oct  3 16:11:05 2019] RAX: 0000000000000000 RBX: ffffffff82011740 RCX: ffffffff810013aa
                    [Thu Oct  3 16:11:05 2019] RDX: ffffffff8203c190 RSI: 0000000000000000 RDI: 0000000000000001
                    [Thu Oct  3 16:11:05 2019] RBP: 0000000000000000 R08: 000000000001ca00 R09: 0000000000000000
                    [Thu Oct  3 16:11:05 2019] R10: 0000000000007ff0 R11: 0000000000000246 R12: 0000000000000000
                    [Thu Oct  3 16:11:05 2019] R13: 0000000000000000 R14: ffffffff82011740 R15: ffffffff82011740
                    [Thu Oct  3 16:11:05 2019]  ? xen_hypercall_sched_op+0xa/0x20
                    [Thu Oct  3 16:11:05 2019]  ? xen_safe_halt+0xc/0x20
                    [Thu Oct  3 16:11:05 2019]  ? default_idle+0x1a/0x140
                    [Thu Oct  3 16:11:05 2019]  ? do_idle+0x1ea/0x260
                    [Thu Oct  3 16:11:05 2019]  ? cpu_startup_entry+0x6f/0x80
                    [Thu Oct  3 16:11:05 2019]  ? start_kernel+0x558/0x578
                    [Thu Oct  3 16:11:05 2019]  ? set_init_arg+0x55/0x55
                    [Thu Oct  3 16:11:05 2019]  ? xen_start_kernel+0x583/0x58d
                    [Thu Oct  3 16:11:05 2019] Mem-Info:
                    [Thu Oct  3 16:11:05 2019] active_anon:25841 inactive_anon:33848 isolated_anon:0
                     active_file:60244 inactive_file:478866 isolated_file:0
                     unevictable:4117 dirty:10002 writeback:28515 unstable:0
                     slab_reclaimable:7812 slab_unreclaimable:8819
                     mapped:29136 shmem:3449 pagetables:4220 bounce:0
                     free:4766 free_pcp:868 free_cma:0
                    [Thu Oct  3 16:11:05 2019] Node 0 active_anon:103364kB inactive_anon:135392kB active_file:240976kB inactive_file:1915464kB unevictable:16468kB isolated(anon):0kB isolated(file):0kB mapped:116544kB dirty:40008kB writeback:114060kB shmem:13796kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
                    [Thu Oct  3 16:11:05 2019] DMA free:10032kB min:36kB low:48kB high:60kB active_anon:4kB inactive_anon:0kB active_file:136kB inactive_file:3392kB unevictable:0kB writepending:128kB present:15868kB managed:15784kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
                    [Thu Oct  3 16:11:05 2019] lowmem_reserve[]: 0 2505 2505 2505 2505
                    [Thu Oct  3 16:11:05 2019] DMA32 free:9032kB min:6372kB low:8936kB high:11500kB active_anon:103360kB inactive_anon:135392kB active_file:240840kB inactive_file:1911944kB unevictable:16468kB writepending:153940kB present:2720256kB managed:2565380kB mlocked:16468kB kernel_stack:8656kB pagetables:16880kB bounce:0kB free_pcp:3472kB local_pcp:244kB free_cma:0kB
                    [Thu Oct  3 16:11:05 2019] lowmem_reserve[]: 0 0 0 0 0
                    [Thu Oct  3 16:11:05 2019] Normal free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:131072kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
                    [Thu Oct  3 16:11:05 2019] lowmem_reserve[]: 0 0 0 0 0
                    [Thu Oct  3 16:11:05 2019] DMA: 2*4kB (M) 1*8kB (M) 36*16kB (ME) 39*32kB (UME) 20*64kB (ME) 12*128kB (UME) 7*256kB (ME) 1*512kB (E) 3*1024kB (UM) 0*2048kB 0*4096kB = 10032kB
                    [Thu Oct  3 16:11:05 2019] DMA32: 68*4kB (MEH) 64*8kB (MEH) 77*16kB (EH) 61*32kB (H) 31*64kB (H) 4*128kB (H) 4*256kB (H) 2*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 9536kB
                    [Thu Oct  3 16:11:05 2019] Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
                    [Thu Oct  3 16:11:05 2019] 544236 total pagecache pages
                    [Thu Oct  3 16:11:05 2019] 0 pages in swap cache
                    [Thu Oct  3 16:11:05 2019] Swap cache stats: add 0, delete 0, find 0/0
                    [Thu Oct  3 16:11:05 2019] Free swap  = 1048572kB
                    [Thu Oct  3 16:11:05 2019] Total swap = 1048572kB
                    [Thu Oct  3 16:11:05 2019] 716799 pages RAM
                    [Thu Oct  3 16:11:05 2019] 0 pages HighMem/MovableOnly
                    [Thu Oct  3 16:11:05 2019] 71508 pages reserved
                    [Thu Oct  3 16:11:05 2019] 0 pages cma reserved
                    [Thu Oct  3 16:11:05 2019] 0 pages hwpoisoned
                    [Thu Oct  3 16:14:23 2019] block tdd: sector-size: 512/512 capacity: 67108864
                    
                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Online
                      olivierlambert Vates ๐Ÿช Co-Founder CEO
                      last edited by olivierlambert

                      This is a message coming from your motherboard yes, from the PCI subsystem. I wouldn't be really confident about this hardware, but if you have backup or if it's not in production, whatever ๐Ÿ˜›

                      1 Reply Last reply Reply Quote 0
                      • P Offline
                        petr.bena
                        last edited by

                        It is running one of production CEPH nodes, but it if crashes, CEPH will transparently failover. VMs running there are just for backup and non-prod stuff, if I knew which HW is causing it, I would replace it, but this message isn't very clear on what is really going on.

                        Other than that everything is running OK, so far no crash...

                        F 1 Reply Last reply Reply Quote 0
                        • F Offline
                          fbifido @petr.bena
                          last edited by

                          @petr-bena You have CEPH running on XCP-ng 8.0 ???
                          How many servers are you using with CEPH?
                          How did you setup CEPH on xcp-ng 8.0?

                          P 1 Reply Last reply Reply Quote 0
                          • P Offline
                            petr.bena @fbifido
                            last edited by

                            fbifido yes, I have 3 CEPH nodes running in separate VM's that have direct passthrough to underlying physical disks. CEPH volume is connected as RBD that forms shared block device on XCP-ng servers. On that shared block device I use LVM.

                            It's all described here: https://github.com/xcp-ng/xcp/wiki/Ceph-on-XCP-ng-7.5-or-later#lvm-on-rbd

                            1 Reply Last reply Reply Quote 1
                            • daveD Offline
                              dave
                              last edited by dave

                              Hi!

                              @petr-bena did you have crashes since your change nmi=dom0 ?

                              We have a similar problem.

                              There are 4 servers in different locations, two standalone, two of them in pools, all with the same hardware:

                              Supermicro X11SRA-RF Version: 1.02
                              and
                              Intel(R) Xeon(R) W-2145 CPU

                              We tried all BIOS Versions and a lot off different settings.

                              Two of them are runnig XCP 7.6 and have uptimes of 143 and 160 days. No Problems at all.

                              Two of them are running XCP 8.0 and crash regulary between 2 or 30 days, everytime with the same error.

                              NMI - PCI system error (SERR)

                              The crash is more likely to happen, if we produce high IO and/or network load on those hosts.

                              We suspected a hardware error, so we took one of those crashing servers to our workshop and testet it for almost two weeks with Prime95 and Memtest86 and other things that came in mind.

                              We were not able to produce any crash. Neither were we able to detect any errors.

                              We put this particular server back in production and it crashed within the first hours while we were migrating some VMs back to him. (with Storage Migration)

                              So i think, it has something to do with XCP-ng 8.0.

                              I will try the change nmi=dom0 next.

                              (XEN) [395218.940883] 
                              (XEN) [395218.940886] 
                              (XEN) [395218.940886] NMI - PCI system error (SERR)
                              (XEN) [395218.940889] ----[ Xen-4.11.1-7.8.xcpng8.0  x86_64  debug=n   Not tainted ]----
                              (XEN) [395218.940889] CPU:    0
                              (XEN) [395218.940890] RIP:    e008:[<ffff82d0802c6d38>] mwait_idle_with_hints+0xf8/0x160
                              (XEN) [395218.940894] RFLAGS: 0000000000000046   CONTEXT: hypervisor
                              (XEN) [395218.940896] rax: 0000000000000001   rbx: 000167730fc96b09   rcx: 0000000000000001
                              (XEN) [395218.940897] rdx: 0000000000000000   rsi: ffff83006f667ef8   rdi: ffff83006f667fff
                              (XEN) [395218.940898] rbp: 0000000000000000   rsp: ffff83006f667e00   r8:  0000000000000048
                              (XEN) [395218.940899] r9:  000530dec0dbea3e   r10: 0000000000000008   r11: ffff83207cac1a68
                              (XEN) [395218.940900] r12: 0000000000000000   r13: 0000000000000001   r14: 0000000000000001
                              (XEN) [395218.940902] r15: ffff82d080573d00   cr0: 0000000080050033   cr4: 0000000000362660
                              (XEN) [395218.940903] cr3: 00000012c899a000   cr2: ffffe783981eb000
                              (XEN) [395218.940904] fsb: 0000000000000000   gsb: ffff88827bf40000   gss: 0000000000000000
                              (XEN) [395218.940906] ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
                              (XEN) [395218.940908] Xen code around <ffff82d0802c6d38> (mwait_idle_with_hints+0xf8/0x160):
                              (XEN) [395218.940908]  89 f0 44 89 e9 0f 01 c9 <0f> b6 47 f5 80 a6 fd 00 00 00 fe 44 89 c1 0f 30
                              (XEN) [395218.940912] Xen stack trace from rsp=ffff83006f667e00:
                              (XEN) [395218.940913]    ffff83207cac4f08 0000000000000000 ffff83207cac4e90 ffff82d080573d00
                              (XEN) [395218.940915]    ffff82d0805baa50 ffff82d080592b20 ffff83207cac4f08 ffff82d0802ccd07
                              (XEN) [395218.940916]    000167730faf9015 0000000100000002 00000108000004c9 0000000000000000
                              (XEN) [395218.940918]    0000000000000000 ffff82d08035b43e ffff83006f7fc000 ffffffffffffffff
                              (XEN) [395218.940919]    ffff82d08035b400 ffff82d080573d00 ffff82d0805baa50 0000000000000000
                              (XEN) [395218.940921]    0000000000000000 ffff82d080592b20 ffff83006f667fff ffff82d08026e505
                              (XEN) [395218.940922]    ffff83006f7fc000 ffff83006f7fc000 ffff83006f7bf000 ffff83207cb69000
                              (XEN) [395218.940924]    00000000ffffffff ffff8320246cc000 ffff82d080592b20 ffff88827ae3d700
                              (XEN) [395218.940926]    ffff88827ae3d700 0000000000000000 0000000000000000 0000000000000005
                              (XEN) [395218.940927]    ffff88827ae3d700 0000000000000246 ffffc9004106b930 0000000000000000
                              (XEN) [395218.940928]    000000000001ca00 0000000000000000 ffffffff810013aa ffffffff8203c190
                              (XEN) [395218.940930]    0000000000000000 0000000000000001 0000010000000000 ffffffff810013aa
                              (XEN) [395218.940931]    000000000000e033 0000000000000246 ffffc90040113eb0 000000000000e02b
                              (XEN) [395218.940933]    6f5b7c2b6f667fe0 6f5b7cae00097f76 6f5b7da200000000 6f5b79516f667fe0
                              (XEN) [395218.940934]    0000e01000000000 ffff83006f7fc000 0000000000000000 0000000000362660
                              (XEN) [395218.940936]    0000000000000000 800000207caef002 0000070100000000 6f5b883e00097f00
                              (XEN) [395218.940938] Xen call trace:
                              (XEN) [395218.940939]    [<ffff82d0802c6d38>] mwait_idle_with_hints+0xf8/0x160
                              (XEN) [395218.940942]    [<ffff82d0802ccd07>] mwait-idle.c#mwait_idle+0x337/0x3d0
                              (XEN) [395218.940945]    [<ffff82d08035b43e>] lstar_enter+0xae/0x120
                              (XEN) [395218.940946]    [<ffff82d08035b400>] lstar_enter+0x70/0x120
                              (XEN) [395218.940950]    [<ffff82d08026e505>] domain.c#idle_loop+0x85/0xb0
                              (XEN) [395218.940951] 
                              (XEN) [395218.940952] 
                              (XEN) [395218.940953] ****************************************
                              (XEN) [395218.940953] Panic on CPU 0:
                              (XEN) [395218.940954] FATAL TRAP: vector = 2 (nmi)
                              (XEN) [395218.940955] [error_code=0000] , IN INTERRUPT CONTEXT
                              (XEN) [395218.940955] ****************************************
                              (XEN) [395218.940956] 
                              (XEN) [395218.940956] Reboot in five seconds...
                              (XEN) [395218.940958] Executing kexec image on cpu0
                              (XEN) [395218.941963] Shot down all CPUs
                              
                              1 Reply Last reply Reply Quote 1
                              • P Offline
                                petr.bena
                                last edited by

                                Hello, no, since I changed this, server is rock solid:

                                20:59:01 up 136 days, 22:40, 1 user, load average: 0.45, 0.31, 0.36

                                1 Reply Last reply Reply Quote 1
                                • olivierlambertO Online
                                  olivierlambert Vates ๐Ÿช Co-Founder CEO
                                  last edited by

                                  dave you should try with 8.1 beta

                                  1 Reply Last reply Reply Quote 0
                                  • daveD Offline
                                    dave
                                    last edited by

                                    @petr-bena Thanks.

                                    I can confirm: Until now everything is stable for us, too. ( with nmi=dom0 )

                                    olivierlambert Since i have only production-servers with the affected hardware ATM, i cant test the 8.1 beta right now. But after relase i will try 8.1 final. Do you think there is a real chance that this error wont appear in 8.1 stock? Or should I do the same change?

                                    1 Reply Last reply Reply Quote 1
                                    • olivierlambertO Online
                                      olivierlambert Vates ๐Ÿช Co-Founder CEO
                                      last edited by

                                      8.1 is bundled with latest and greated Xen, 4.13. So yeah, it might change (eg if it's a bug fixed in a more recent Xen version). Otherwise, keep nmi configuration as it ๐Ÿ™‚

                                      1 Reply Last reply Reply Quote 0
                                      • M Offline
                                        mauricio_hps
                                        last edited by

                                        Hi ! Excusme for my bad English. Iยดve installed Xen Server 7.2 for fist time in my lyfe and it crash with FATAL TRAP:vector = 2 (nime)).
                                        How edit boot xen boot parameter and add nmi=dom0 ?
                                        Thanks !

                                        1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Online
                                          olivierlambert Vates ๐Ÿช Co-Founder CEO
                                          last edited by

                                          Hi mauricio_hps

                                          This is a XCP-ng forum, please try with XCP-ng ๐Ÿ˜‰ https://xcp-ng.org

                                          1 Reply Last reply Reply Quote 1
                                          • First post
                                            Last post