Hi @olivierlambert
Now, I have the same problem on 4 servers. Machines reset every few hours!!! Please HELP.
The machines have been running stably since:
reboot system boot 4.19.0+1 Wed Dec 28 12:30 - 05:50 (217+16:19)
Since then, the following patches have been installed but not restarted:
May 16 09:07:40 Updated: xen-libs-4.13.5-9.30.3.xcpng8.2.x86_64
May 16 09:07:41 Updated: guest-templates-json-1.9.6-1.2.xcpng8.2.noarch
May 16 09:07:41 Updated: xcp-ng-release-presets-8.2.1-6.x86_64
May 16 09:07:41 Updated: xen-hypervisor-4.13.5-9.30.3.xcpng8.2.x86_64
May 16 09:07:42 Updated: xen-dom0-libs-4.13.5-9.30.3.xcpng8.2.x86_64
May 16 09:07:43 Updated: xen-tools-4.13.5-9.30.3.xcpng8.2.x86_64
May 16 09:07:44 Updated: xen-dom0-tools-4.13.5-9.30.3.xcpng8.2.x86_64
May 16 09:07:48 Updated: xcp-ng-release-config-8.2.1-6.x86_64
May 16 09:07:49 Updated: xcp-ng-release-8.2.1-6.x86_64
May 16 09:07:49 Updated: guest-templates-json-data-other-1.9.6-1.2.xcpng8.2.noarch
May 16 09:07:50 Updated: guest-templates-json-data-linux-1.9.6-1.2.xcpng8.2.noarch
May 16 09:07:50 Updated: guest-templates-json-data-windows-1.9.6-1.2.xcpng8.2.noarch
May 16 09:07:51 Updated: sudo-1.8.23-10.el7_9.3.x86_64
May 16 09:08:01 Updated: linux-firmware-20190314-5.1.xcpng8.2.noarch
May 16 09:08:03 Updated: 2:microcode_ctl-2.1-26.xs23.1.xcpng8.2.x86_64
May 29 06:57:47 Updated: xen-libs-4.13.5-9.31.1.xcpng8.2.x86_64
May 29 06:57:48 Updated: xcp-ng-release-presets-8.2.1-9.x86_64
May 29 06:57:49 Updated: message-switch-1.23.2-4.1.xcpng8.2.x86_64
May 29 06:57:50 Updated: forkexecd-1.18.1-2.1.xcpng8.2.x86_64
May 29 06:57:50 Updated: xen-hypervisor-4.13.5-9.31.1.xcpng8.2.x86_64
May 29 06:57:51 Updated: xen-dom0-libs-4.13.5-9.31.1.xcpng8.2.x86_64
May 29 06:57:56 Updated: 2:qemu-4.2.1-4.6.3.1.xcpng8.2.x86_64
May 29 06:58:00 Updated: xen-tools-4.13.5-9.31.1.xcpng8.2.x86_64
May 29 06:58:01 Updated: xen-dom0-tools-4.13.5-9.31.1.xcpng8.2.x86_64
May 29 06:58:03 Updated: xenopsd-0.150.14-1.1.xcpng8.2.x86_64
May 29 06:58:03 Updated: xenopsd-cli-0.150.14-1.1.xcpng8.2.x86_64
May 29 06:58:05 Updated: xenopsd-xc-0.150.14-1.1.xcpng8.2.x86_64
May 29 06:58:06 Updated: gpumon-0.18.0-4.3.xcpng8.2.x86_64
May 29 06:58:06 Updated: xcp-rrdd-1.33.2-1.1.xcpng8.2.x86_64
May 29 06:58:08 Updated: rrdd-plugins-1.10.8-5.2.xcpng8.2.x86_64
May 29 06:58:09 Updated: xapi-tests-1.249.28-1.2.xcpng8.2.x86_64
May 29 06:58:13 Updated: xapi-core-1.249.28-1.2.xcpng8.2.x86_64
May 29 06:58:16 Updated: sm-2.30.8-2.1.xcpng8.2.x86_64
May 29 06:58:20 Updated: xcp-ng-release-config-8.2.1-9.x86_64
May 29 06:58:21 Updated: xcp-ng-release-8.2.1-9.x86_64
May 29 06:58:22 Updated: 2:microcode_ctl-2.1-26.xs25.1.xcpng8.2.x86_64
May 29 06:58:28 Updated: linux-firmware-20190314-7.1.xcpng8.2.noarch
May 29 06:58:33 Updated: xapi-xe-1.249.28-1.2.xcpng8.2.x86_64
May 29 06:58:34 Updated: varstored-guard-0.6.2-2.xcpng8.2.x86_64
May 29 06:58:35 Updated: xcp-networkd-0.56.2-2.xcpng8.2.x86_64
May 29 06:58:36 Updated: sm-rawhba-2.30.8-2.1.xcpng8.2.x86_64
Jul 28 10:10:40 Updated: xen-libs-4.13.5-9.34.1.xcpng8.2.x86_64
Jul 28 10:10:41 Updated: xen-hypervisor-4.13.5-9.34.1.xcpng8.2.x86_64
Jul 28 10:10:42 Updated: xen-dom0-libs-4.13.5-9.34.1.xcpng8.2.x86_64
Jul 28 10:10:42 Updated: xen-tools-4.13.5-9.34.1.xcpng8.2.x86_64
Jul 28 10:10:44 Updated: xen-dom0-tools-4.13.5-9.34.1.xcpng8.2.x86_64
Jul 28 10:10:54 Updated: linux-firmware-20190314-8.1.xcpng8.2.noarch
Yesterday morning at 5:30 to 5:50 I reset the all servers (zenbleed patch), since then i have random reboots on all 4 servers.
server1: 2x AMD EPYC 7282, ASUS Mainboard
reboot system boot 4.19.0+1 Thu Aug 3 10:57 - 13:25 (1+02:27)
reboot system boot 4.19.0+1 Thu Aug 3 07:33 - 13:25 (1+05:51)
reboot system boot 4.19.0+1 Thu Aug 3 05:57 - 13:25 (1+07:27)
reboot system boot 4.19.0+1 Thu Aug 3 05:36 - 13:25 (1+07:48)
serwer2: 2x AMD EPYC 7282, ASUS Mainboard
reboot system boot 4.19.0+1 Fri Aug 4 13:07 - 13:25 (00:18)
reboot system boot 4.19.0+1 Fri Aug 4 00:21 - 13:25 (13:04)
reboot system boot 4.19.0+1 Thu Aug 3 07:51 - 13:25 (1+05:34)
reboot system boot 4.19.0+1 Thu Aug 3 05:55 - 13:25 (1+07:30)
Server3: 2x AMD EPYC 7282, Supermicro Mainboard
reboot system boot 4.19.0+1 Fri Aug 4 13:07 - 13:14 (00:06)
reboot system boot 4.19.0+1 Fri Aug 4 00:21 - 13:14 (12:53)
reboot system boot 4.19.0+1 Thu Aug 3 07:51 - 13:14 (1+05:23)
reboot system boot 4.19.0+1 Thu Aug 3 05:55 - 13:14 (1+07:19)
server4: 2x AMD EPYC 7282, Supermicro Mainboard
reboot system boot 4.19.0+1 Fri Aug 4 00:33 - 13:26 (12:52)
reboot system boot 4.19.0+1 Thu Aug 3 05:46 - 13:26 (1+07:40)
What can I provide you to solve the problem.
Hardware issues ruled out, power supply also OK (2 power supplies, 2 independent outlets).
In /var/crash i have old file
ls -al /var/crash/
-rw-r--r-- 1 root root 67108864 2022-12-28 .sacrificial-space-for-logs
When one server restarted, I catch It and that was a full machine restart POST BIOS.
Please help