Hello, both issues seem to be related to memory corruption.
- The first trace is an #NMI exception (one of the causes can be a parity error detected by the HW). Moreover, CPU#12 gets the #MC(machine check) exception. The #MC is triggered by the HW to notify the system software that there's an unrecoverable issue with the HW.
- The second one is the invalid opcode in the Xen Hypervisor context. So it means that either the instruction flow is corrupted, or the instruction pointer is corrupted.
My hypothesis is:
In the first case - the ECC memory error is detected (and reported by HW) which makes the hypervisor panic and stop
In the second case - the memory error is not detected (but the memory is still corrupted) but at some point, this corruption provokes the same result on the Xen hypervisor.
Can you look with Hetzner guys if there's a way to change memory modules?
The other way to validate this hypothesis is to install a different system software (another OS/hypervisor, another version of hypervisor) and see if you experience the same issue.
You can also add on Xen command line "ler=true" option. This can give us more traces (leveraged by HW) to check if there's nothing abnormal on software level. I'll probably will need your Xen image with its symbole table (xen-syms-XXX and xen-syms-XXX.map)