So it would seem I have a host who crashes on a periodic basis but only occasionally, seems like it's about 2 times per year ish, trying to diagnose this though.
My logs are below, truncated since they are way too long to actually post here.
xen.log:
0x8
(XEN) [270483.449769] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [270598.526533] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [270685.029195] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [271110.558952] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [271309.499968] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [271340.748832] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [271349.428606] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [272600.756776] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [273163.116684] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x101, fault address = 0xfffffffdf8000000, flags = 0x8
(XEN) [277052.852717] Uhhuh. NMI received for unknown reason 31.
(XEN) [277052.852719] Do you have a strange power saving mode enabled?
(XEN) [277052.852722] ----[ Xen-4.13.4-9.21.2 x86_64 debug=n Not tainted ]----
(XEN) [277052.852723] CPU: 0
(XEN) [277052.852725] RIP: e008:[<ffff82d0802d9d98>] arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x98/0xc0
(XEN) [277052.852731] RFLAGS: 0000000000000246 CONTEXT: hypervisor
(XEN) [277052.852734] rax: 0000000000000000 rbx: ffff83107bcafc78 rcx: 0000000000000048
(XEN) [277052.852736] rdx: 0000000000000000 rsi: ffff83007be8ffff rdi: ffff83107bcafc78
(XEN) [277052.852737] rbp: ffff83107bcafc00 rsp: ffff83007be8fe68 r8: ffff83007be8fef8
(XEN) [277052.852739] r9: 0000000000000002 r10: 0000fbfaa06ecc95 r11: 0000fbfa820743ef
(XEN) [277052.852740] r12: 0000fbfa64d405b7 r13: ffff83107bcafc30 r14: ffff82d080597270
(XEN) [277052.852742] r15: ffff82d0805bc300 cr0: 000000008005003b cr4: 00000000003506e0
(XEN) [277052.852743] cr3: 00000010448f3000 cr2: 00007ffcfeacafc8
(XEN) [277052.852744] fsb: 0000000000000000 gsb: ffff8aa47c440000 gss: 0000000000000000
(XEN) [277052.852746] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) [277052.852749] Xen code around <ffff82d0802d9d98> (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x98/0xc0):
(XEN) [277052.852750] 66 90 0f 1f 40 00 fb f4 <0f> b6 46 f5 41 80 a0 fe 00 00 00 fe 66 90 fa c3
(XEN) [277052.852754] Xen stack trace from rsp=ffff83007be8fe68:
(XEN) [277052.852755] ffff82d0802da28a 0000000000000000 0000000000000000 0000000000000000
(XEN) [277052.852757] ffff82d080597270 ffff82d0805bc300 ffff82d08059db00 ffff8310447ca000
(XEN) [277052.852759] ffff82d08059db00 ffff8310447ca000 0000000000000000 0000000000000000
(XEN) [277052.852761] ffff82d080278b0c ffff82d080278a40 ffff8310447ca000 ffff83107bcb1000
(XEN) [277052.852763] 00000000ffffffff ffff831044926000 0000000000000000 0000000000000000
(XEN) [277052.852764] 0000000000000000 0000000000000000 0000000000000001 0000000000000001
(XEN) [277052.852766] 0000fbeb7bad0fb6 0000000000000000 0000000000000000 000000003359c3d1
(XEN) [277052.852767] ffffffff92d3a0f0 ffffffff9364f310 0000000006568a76 ffffffff9364af78
(XEN) [277052.852769] 0000000000000001 0000000000000000 ffffffff92d3a4be 0000000000000000
(XEN) [277052.852770] 0000000000000246 ffffb50d80393ea8 0000000000000000 7bdcdc407be8ffe0
(XEN) [277052.852772] 7bdcdcc30009bf75 7bdcddb700000000 7bdcd9667be8ffe0 0000e01000000000
(XEN) [277052.852774] ffff83107bcb0000 0000000000000000 00000000003506e0 0000000000000000
(XEN) [277052.852776] 0000000000000000 7b01d30000000000 7bdce8300009bf00
(XEN) [277052.852777] Xen call trace:
(XEN) [277052.852779] [<ffff82d0802d9d98>] R arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x98/0xc0
(XEN) [277052.852782] [<ffff82d0802da28a>] S arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x36a/0x630
(XEN) [277052.852785] [<ffff82d080278b0c>] S arch/x86/domain.c#idle_loop+0xcc/0xf0
(XEN) [277052.852786] [<ffff82d080278a40>] S arch/x86/domain.c#idle_loop+0/0xf0
(XEN) [277052.852787]
(XEN) [277052.852789]
(XEN) [277052.852789] ****************************************
(XEN) [277052.852790] Panic on CPU 0:
(XEN) [277052.852791] FATAL TRAP: vector = 2 (nmi)
(XEN) [277052.852792] [error_code=0000]
(XEN) [277052.852793] ****************************************
(XEN) [277052.852793]
(XEN) [277052.852794] Reboot in five seconds...
(XEN) [277052.852796] Executing kexec image on cpu0
(XEN) [277052.853813] Shot down all CPUs
dom0.log
[ 57.411835] ERR: CIFS VFS: Send error in SessSetup = -13
[ 57.411857] ERR: CIFS VFS: cifs_mount failed w/return code = -13
[ 58.620725] INFO: EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[ 59.866512] NOTICE: Status code returned 0xc000006d STATUS_LOGON_FAILURE
[ 59.866518] ERR: CIFS VFS: Send error in SessSetup = -13
[ 59.866528] ERR: CIFS VFS: cifs_mount failed w/return code = -13
[ 61.063207] INFO: block tda: sector-size: 512/512 capacity: 41943040
[ 61.611203] INFO: device vif1.0 entered promiscuous mode
[ 61.673280] INFO: tun: Universal TUN/TAP device driver, 1.6
[ 61.882361] INFO: device tap1.0 entered promiscuous mode
[ 74.108493] INFO: device tap1.0 left promiscuous mode
[ 75.298831] INFO: vif vif-1-0 vif1.0: Guest Rx ready
[ 1016.041203] INFO: device xapi0 entered promiscuous mode
[ 1016.965567] INFO: block tdb: sector-size: 512/512 capacity: 419430400
[ 1017.489143] INFO: device vif2.0 entered promiscuous mode
[ 1017.757125] INFO: device tap2.0 entered promiscuous mode
[ 1023.605631] INFO: device tap2.0 left promiscuous mode
[ 1025.834352] INFO: vif vif-2-0 vif2.0: Guest Rx ready
[ 31340.135234] INFO: md: data-check of RAID array md127
[ 39191.770925] INFO: md: md127: data-check done.
[ 54858.770670] ERR: CIFS VFS: Server 10.5.10.50 has not responded in 120 seconds. Reconnecting...
[ 131654.041594] ERR: CIFS VFS: Server 10.5.10.50 has not responded in 120 seconds. Reconnecting...
[ 141852.456853] ERR: CIFS VFS: Server 10.5.10.50 has not responded in 120 seconds. Reconnecting...
[ 159158.312670] NOTICE: Status code returned 0xc000006d STATUS_LOGON_FAILURE
[ 159158.312677] ERR: CIFS VFS: Send error in SessSetup = -13
[ 159158.312687] ERR: CIFS VFS: cifs_mount failed w/return code = -13
[ 159207.204011] NOTICE: Status code returned 0xc000006d STATUS_LOGON_FAILURE
[ 159207.204018] ERR: CIFS VFS: Send error in SessSetup = -13
[ 159207.204027] ERR: CIFS VFS: cifs_mount failed w/return code = -13
[ 159357.625040] ERR: CIFS VFS: Error connecting to socket. Aborting operation.
[ 159357.625051] ERR: CIFS VFS: cifs_mount failed w/return code = -111
[ 159357.658035] ERR: CIFS VFS: Error connecting to socket. Aborting operation.
[ 159357.658044] ERR: CIFS VFS: cifs_mount failed w/return code = -111
[ 161050.954560] INFO: device xapi5 entered promiscuous mode
[ 161082.918747] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 161083.284405] INFO: block tdd: sector-size: 512/512 capacity: 10869244
[ 161083.802559] INFO: device vif3.0 entered promiscuous mode
[ 161084.088086] INFO: device tap3.0 entered promiscuous mode
[ 161109.344679] INFO: device tap3.0 left promiscuous mode
[ 161109.986838] INFO: device vif3.0 left promiscuous mode
[ 161124.631349] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 161124.987941] INFO: block tdd: sector-size: 512/512 capacity: 10869244
[ 161125.506087] INFO: device vif4.0 entered promiscuous mode
[ 161125.787692] INFO: device tap4.0 entered promiscuous mode
[ 161233.296903] INFO: device tap4.0 left promiscuous mode
[ 161234.069009] INFO: device vif4.0 left promiscuous mode
[ 161250.012788] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 161250.433289] INFO: block tdd: sector-size: 512/512 capacity: 9568512
[ 161250.957429] INFO: device vif5.0 entered promiscuous mode
[ 161251.233068] INFO: device tap5.0 entered promiscuous mode
[ 162259.178729] INFO: device tap5.0 left promiscuous mode
[ 162259.920375] INFO: device vif5.0 left promiscuous mode
[ 162261.427391] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 162261.829462] INFO: block tdd: sector-size: 512/512 capacity: 9568512
[ 162262.339599] INFO: device vif6.0 entered promiscuous mode
[ 162262.616402] INFO: device tap6.0 entered promiscuous mode
[ 162407.932290] INFO: device tap6.0 left promiscuous mode
[ 162408.623844] INFO: device vif6.0 left promiscuous mode
[ 162410.130951] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 162410.523163] INFO: block tdd: sector-size: 512/512 capacity: 9568512
[ 162411.028504] INFO: device vif7.0 entered promiscuous mode
[ 162411.308947] INFO: device tap7.0 entered promiscuous mode
[ 162821.183555] INFO: device tap7.0 left promiscuous mode
[ 162821.967356] INFO: device vif7.0 left promiscuous mode
[ 162823.469144] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 162823.881822] INFO: block tdd: sector-size: 512/512 capacity: 9568512
[ 162824.366571] INFO: device vif8.0 entered promiscuous mode
[ 162824.638473] INFO: device tap8.0 entered promiscuous mode
[ 163337.236773] INFO: device tap8.0 left promiscuous mode
[ 163337.925312] INFO: device vif8.0 left promiscuous mode
[ 163339.313977] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 163339.869661] INFO: device vif9.0 entered promiscuous mode
[ 163340.143727] INFO: device tap9.0 entered promiscuous mode
[ 163345.656310] INFO: device tap9.0 left promiscuous mode
[ 163347.318717] INFO: vif vif-9-0 vif9.0: Guest Rx ready
[ 163356.558725] INFO: vif vif-9-0 vif9.0: Guest Rx ready
[ 163386.326178] INFO: device vif9.0 left promiscuous mode
[ 163407.871601] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 163408.424979] INFO: device vif10.0 entered promiscuous mode
[ 163408.705660] INFO: device tap10.0 entered promiscuous mode
[ 163417.631824] INFO: device tap10.0 left promiscuous mode
[ 163419.199561] INFO: vif vif-10-0 vif10.0: Guest Rx ready
[ 163624.199308] INFO: device vif10.0 left promiscuous mode
[ 163625.565311] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 163626.129824] INFO: device vif11.0 entered promiscuous mode
[ 163626.403920] INFO: device tap11.0 entered promiscuous mode
[ 163638.713201] INFO: vif vif-11-0 vif11.0: Guest Rx ready
[ 164717.561320] INFO: device tap11.0 left promiscuous mode
[ 164718.459194] INFO: device vif11.0 left promiscuous mode
[ 164719.892533] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 164720.509181] INFO: device vif12.0 entered promiscuous mode
[ 164720.783963] INFO: device tap12.0 entered promiscuous mode
[ 164730.684546] INFO: device tap12.0 left promiscuous mode
[ 164732.743130] INFO: vif vif-12-0 vif12.0: Guest Rx ready
[ 169150.464914] INFO: device vif12.0 left promiscuous mode
[ 169151.874585] INFO: block tdc: sector-size: 512/512 capacity: 67108864
[ 169152.465030] INFO: device vif13.0 entered promiscuous mode
[ 169152.742607] INFO: device tap13.0 entered promiscuous mode
[ 169159.235091] INFO: device tap13.0 left promiscuous mode
[ 169161.039490] INFO: vif vif-13-0 vif13.0: Guest Rx ready
[ 203049.047874] ERR: CIFS VFS: Server 10.5.10.50 has not responded in 120 seconds. Reconnecting...
[ 213010.741405] INFO: pcieport 0000:00:01.1: AER: Corrected error received: 0000:00:00.0
[ 213010.741413] ERR: pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
[ 213010.741425] ERR: pcieport 0000:00:01.1: device [1022:1453] error status/mask=00000040/00006000
[ 213010.741431] ERR: pcieport 0000:00:01.1: [ 6] BadTLP```