Weird performance alert. Start importing VM for no reason.

ph7

XCP-ng, updated
XO CE, commit 749f0 (now 2 behind)

For some strange reason, I got performance alerts every minute starting at 05:00 this morning.

Screenshot 2025-02-24 at 10-27-12 Papperskorg jake.blues@protonmail.com Proton Mail.png

XO was trying to start/import a VM without any reason to.

The backup of this VM is started from a Sequence-job and this job was run from 03:04 to 03:05

Screenshot 2025-02-24 at 10-53-28 Backup.png

The ed2b-job has health check enabled, but should only run on 1st and 15th

Screenshot 2025-02-24 at 11-01-32 Backup.png

Screenshot 2025-02-24 at 10-48-34 Backup.png

I also run a replication job during the day outside the backup schedule

Screenshot 2025-02-24 at 11-58-56 Backup.png

The VM has autostart enabled, and maybe the host crashed and when it restarted it somehow thought it should auto start the replicated VM
Unfortunately I did destroy the VM but I did read that it was started 32min ago, and this was around 05:30.
Now when I checked the Host performance, there was nothing in the graph

Screenshot 2025-02-24 at 10-25-22 X2 🚀 (Ryssen 🪐).png

[11:23 x2 ~]# uptime 11:26:44 up 6:37, 2 users, load average: 0,10, 0,09, 0,10

The host did restart for some reason
I ran dmesg and at line 618 I found that I should run fsck
[ 3.459057] FAT-fs (sda4): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

Maybe this isn't a backup problem, but I started investigating it as that

olivierlambert

Continuous replication is using export/import mechanism, so I think that's the reason for this task

ph7

@olivierlambert
All backups and Continuous replication ran fine during these ~ 15hours that the graphics was gone.
Do You have any clue what logs I can check

ph7

In the replication job I keep Replication Retention=2
So at the time of reboot, 03:48 UTC, there was 2 saved CR-jobs that ran at 01:10 and 01:40
Screenshot 2025-02-24 at 20-59-29 Backup.png

Screenshot 2025-02-24 at 20-58-47 Backup.png

According to the time in red markings, The first alert ran at 05:00 CET (UTC + 1 hour)

Screenshot 2025-02-24 at 20-31.png

And this is from dmesg:
[ 0.994105] rtc_cmos 00:02: setting system clock to 2025-02-24 03:48:56 UTC (1740368936)

I can not figure out

why did the host reboot?
why did one of the CR-jobs start ??
why was there ~15 hours without any graph?

I have UPS with NUT-shutdown on the host and on my trueNAS, with no indication of power failure.

ph7

I increased the dom0 RAM from default 1.75 to 2 GiB
Hopefully this will do.