Windows 2022 VM - Reboot triggered - VM shuts down
-
Hi!
I am using a Windows 2022 VM (fully patched) von XCP-ng 8.2.1
The VM does a reboot every evening through a Windows-scheduler-job (shutdown -r -t 0 -f)This is working for weeks, now.
Yesterday, the VM did just shut down - instead of a reboot.
The only message in the eventlog is: Error code 7043. Couldn't stop Hypervisor Tools service after preshutdown event.
...but: This message is always there - also if I reboot successfully...After the reboot, there was a 161 Error from volmgr
--> Did you ever see, that a VM did shutdown instead of reboot?
Thank you for your help!
-
@KPS I suppose you are using Citrix drivers? With management agents? Dynamic memory?
-
@Darkbeldin said in Windows 2022 VM - Reboot triggered - VM shuts down:
@KPS I suppose you are using Citrix drivers? With management agents? Dynamic memory?
Citrix Agent with management agent, but with static memory...
-
@KPS So quit normal setup, not sure what can be done, does it do this every time or just this once?
-
The problem did happen one more time. The server does a daily restart, but last night it just stopped. Same behaviour as last time:
- Task scheduler starts "C:\Windows\System32\shutdown.exe -r -t 120 -f"
- System starts to shut down and Eventlog just stops logging
- XCP-ng shows, VM is off
The last event prior to the "manual" boot is:
Event 7036
Service"VSS Writer for the internal windows database" is now in state "stopped" (translated)I see, that there is a dump-file written, but I do not really know, how to analyze it.
Do you have any idea on how to solve this?
Best wishes
-
The analysis of the dump did show up:
UNEXPECTED_KERNEL_MODE_TRAP (7f) / EXCEPTION_DOUBLE_FAULT
According to Microsoft:
=> A double fault, which is a fault that occurred while processing an earlier fault, which always results in a system failure.
=> Bug check 0x7F typically occurs after you install faulty or mismatched hardware, especially memory, or if installed hardware fails.
=> Check the availability of updates for the ACPI/BIOS, the hard driver controller, or network cards from the hardware manufacturer.
Did you ever see that behaviour?
-
Hi!
Today, it happened again. Same behaviour. Nothing special in the logs, but VM is shut down.
Any ideas on how to solve this?
-
Hi!
It happened one more time. This time on a node with AMD-CPU and without memory dump. Another Windows 2022 VM...
Did you ever expect this?
-
I would try different combinations in case:
- create a new empty VM, attach the disk from the "old" one to the new one, check if it's the same behavior
- remove Citrix tools, install XCP-ng tools and see if it continues to have the problem
-
I am quite frustrated in trying to "force" the behaviour...
I installed 4 Windows 2022 Test-VMs:
- BIOS + XCP-Tools
- BIOS + XenTools
- UEFI + XCP-Tools
- UEFI + XenTools
The 4 VMs are all rebooting every 6 Minutes for the last 20h, but non on them did get "shut down", while my production VMs, that are rebooting every 24h are affected. Currently one (changing) of the 6 Windows 2022 VMs is shutdown once a month.
I have no more idea about the "raise condition".
-
Hi!
I thought, I did "understand the issue, now, but I am having another one...
About the first one:
The problem seemed to end, if the MEMORY.DMP-file was deleted. Only the "second dump" did trigger the issue.But:
Last night, i had another strange issue on a Windows 2022 VM with Citrix Tool 9.3.1.
The system did a scheduled reboot but after that, it was hanging on the UEFI boot manager:Error: 0xc0000225
A Required Device Isn't Connected or Can't Be AccessedThe Eventlog did show a clean shutdown without errors. The only strange thing is one event: "The system has rebooted without cleanly shutting down first."
I started the OS-selection and the system did boot without issues.
I am not sure, of this is XCP-ng-related, but this is worse, than the first issue, as I cannot "solve" it by a script, that is checking the status of the VM.
I have never seen this before.
What do you think?
Thank you for your help!
-
@KPS When that force reboot command is issued, the VM:
- Is under intensive I/O?
- Has a backup job started/running?
-
@tuxen
The reboots are scheduled in "out-of-office-hours", but some hours before the next backup-job.So, there is nearly zero load.
The hosts are beefy AMD Genoa-systems, that are not really in use. The problem did already happen, when only ONE VM is on an AMD 9374F....it did just happen 15 minutes ago. Dump was written and it happened although the dump before was deleted. That single VM did have the problem 1 month ago for the last time (daily reboot).
-
@KPS I was exactly thinking about an after-hour task doing heavy storage I/O (e.g data replication or ETL-like workloads). Under this scenario, a forced reboot might cause some sort of file system corruption due to uncommitted data being lost.
Now, other source of issue comes to mind: automatic Windows Update. Is this service active? I'm not a Windows expert but a forced reboot during a system update might also cause an unexpected behavior.
Seeing all those errors, it seems that some system file or DLL got corrupted, needing a repair. It's strongly recommended taking a snapshot before running a system repair.
-
@tuxen said in Windows 2022 VM - Reboot triggered - VM shuts down:
Now, other source of issue comes to mind: automatic Windows Update. Is this service active?
Thank you for your answer, but both is not the case. Windows Updates are only triggered manually. There is VERY low I/O and CPU-load, when the reboot is triggered.
-
@KPS one thing is clear to me. The reboot is triggering a VM shutdown due to a system crash (kernel errors and memory dump files being a lead). Without a detailed stack trace (like Linux's kernel panic) and the difficulty in reproducing the issue, troubleshooting is a very hard task. One last thing I'd check is the
/var/log/daemon.log
at the VM shutdown time window. -
Is the instance of Windows licensed?
-
@tuxen
I did send the MEMORY.DMP-files to a microsoft specialist and did post the result on Juli 25 in that thread.
daemon.log is quite hard for me to read. I did not find something, I can see as an error. It looks, like a "shutdown" - not a reboot.@chrisfonte
Yes, fully licensed. It is not a "shutdown because of missing licenses after 24h".