XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Windows 2022 VM - Reboot triggered - VM shuts down

    Scheduled Pinned Locked Moved Compute
    18 Posts 5 Posters 1.5k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K Offline
      KPS Top contributor @Darkbeldin
      last edited by KPS

      The problem did happen one more time. The server does a daily restart, but last night it just stopped. Same behaviour as last time:

      • Task scheduler starts "C:\Windows\System32\shutdown.exe -r -t 120 -f"
      • System starts to shut down and Eventlog just stops logging
      • XCP-ng shows, VM is off

      The last event prior to the "manual" boot is:

      Event 7036
      Service"VSS Writer for the internal windows database" is now in state "stopped" (translated)

      I see, that there is a dump-file written, but I do not really know, how to analyze it.

      Do you have any idea on how to solve this?

      Best wishes

      K 1 Reply Last reply Reply Quote 0
      • K Offline
        KPS Top contributor @KPS
        last edited by

        The analysis of the dump did show up:

        UNEXPECTED_KERNEL_MODE_TRAP (7f) / EXCEPTION_DOUBLE_FAULT

        According to Microsoft:

        => A double fault, which is a fault that occurred while processing an earlier fault, which always results in a system failure.

        => Bug check 0x7F typically occurs after you install faulty or mismatched hardware, especially memory, or if installed hardware fails.

        => Check the availability of updates for the ACPI/BIOS, the hard driver controller, or network cards from the hardware manufacturer.

        https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x7f--unexpected-kernel-mode-trap

        Did you ever see that behaviour?

        K 1 Reply Last reply Reply Quote 0
        • K Offline
          KPS Top contributor @KPS
          last edited by

          Hi!

          Today, it happened again. Same behaviour. Nothing special in the logs, but VM is shut down.

          Any ideas on how to solve this?

          K 1 Reply Last reply Reply Quote 0
          • K Offline
            KPS Top contributor @KPS
            last edited by

            Hi!

            It happened one more time. This time on a node with AMD-CPU and without memory dump. Another Windows 2022 VM...

            Did you ever expect this?

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              I would try different combinations in case:

              1. create a new empty VM, attach the disk from the "old" one to the new one, check if it's the same behavior
              2. remove Citrix tools, install XCP-ng tools and see if it continues to have the problem
              K 1 Reply Last reply Reply Quote 0
              • K Offline
                KPS Top contributor @olivierlambert
                last edited by

                I am quite frustrated in trying to "force" the behaviour...

                I installed 4 Windows 2022 Test-VMs:

                • BIOS + XCP-Tools
                • BIOS + XenTools
                • UEFI + XCP-Tools
                • UEFI + XenTools

                The 4 VMs are all rebooting every 6 Minutes for the last 20h, but non on them did get "shut down", while my production VMs, that are rebooting every 24h are affected. Currently one (changing) of the 6 Windows 2022 VMs is shutdown once a month.

                I have no more idea about the "raise condition".

                K 1 Reply Last reply Reply Quote 0
                • K Offline
                  KPS Top contributor @KPS
                  last edited by KPS

                  Hi!

                  I thought, I did "understand the issue, now, but I am having another one...

                  About the first one:
                  The problem seemed to end, if the MEMORY.DMP-file was deleted. Only the "second dump" did trigger the issue.

                  But:

                  Last night, i had another strange issue on a Windows 2022 VM with Citrix Tool 9.3.1.
                  The system did a scheduled reboot but after that, it was hanging on the UEFI boot manager:

                  Error: 0xc0000225
                  A Required Device Isn't Connected or Can't Be Accessed

                  The Eventlog did show a clean shutdown without errors. The only strange thing is one event: "The system has rebooted without cleanly shutting down first."

                  I started the OS-selection and the system did boot without issues.

                  I am not sure, of this is XCP-ng-related, but this is worse, than the first issue, as I cannot "solve" it by a script, that is checking the status of the VM.

                  I have never seen this before.

                  What do you think?

                  Thank you for your help!

                  T 1 Reply Last reply Reply Quote 0
                  • T Offline
                    tuxen Top contributor @KPS
                    last edited by

                    @KPS When that force reboot command is issued, the VM:

                    1. Is under intensive I/O?
                    2. Has a backup job started/running?
                    K 1 Reply Last reply Reply Quote 0
                    • K Offline
                      KPS Top contributor @tuxen
                      last edited by

                      @tuxen
                      The reboots are scheduled in "out-of-office-hours", but some hours before the next backup-job.

                      So, there is nearly zero load.
                      The hosts are beefy AMD Genoa-systems, that are not really in use. The problem did already happen, when only ONE VM is on an AMD 9374F.

                      ...it did just happen 15 minutes ago. Dump was written and it happened although the dump before was deleted. That single VM did have the problem 1 month ago for the last time (daily reboot).

                      T 1 Reply Last reply Reply Quote 0
                      • T Offline
                        tuxen Top contributor @KPS
                        last edited by

                        @KPS I was exactly thinking about an after-hour task doing heavy storage I/O (e.g data replication or ETL-like workloads). Under this scenario, a forced reboot might cause some sort of file system corruption due to uncommitted data being lost.

                        Now, other source of issue comes to mind: automatic Windows Update. Is this service active? I'm not a Windows expert but a forced reboot during a system update might also cause an unexpected behavior.

                        Seeing all those errors, it seems that some system file or DLL got corrupted, needing a repair. It's strongly recommended taking a snapshot before running a system repair.

                        K 1 Reply Last reply Reply Quote 0
                        • K Offline
                          KPS Top contributor @tuxen
                          last edited by

                          @tuxen said in Windows 2022 VM - Reboot triggered - VM shuts down:

                          Now, other source of issue comes to mind: automatic Windows Update. Is this service active?

                          Thank you for your answer, but both is not the case. Windows Updates are only triggered manually. There is VERY low I/O and CPU-load, when the reboot is triggered.

                          T 1 Reply Last reply Reply Quote 0
                          • T Offline
                            tuxen Top contributor @KPS
                            last edited by

                            @KPS one thing is clear to me. The reboot is triggering a VM shutdown due to a system crash (kernel errors and memory dump files being a lead). Without a detailed stack trace (like Linux's kernel panic) and the difficulty in reproducing the issue, troubleshooting is a very hard task. One last thing I'd check is the /var/log/daemon.log at the VM shutdown time window.

                            K 1 Reply Last reply Reply Quote 0
                            • C Offline
                              chrisfonte
                              last edited by

                              Is the instance of Windows licensed?

                              1 Reply Last reply Reply Quote 0
                              • K Offline
                                KPS Top contributor @tuxen
                                last edited by

                                @tuxen
                                I did send the MEMORY.DMP-files to a microsoft specialist and did post the result on Juli 25 in that thread.
                                daemon.log is quite hard for me to read. I did not find something, I can see as an error. It looks, like a "shutdown" - not a reboot.

                                @chrisfonte
                                Yes, fully licensed. It is not a "shutdown because of missing licenses after 24h".

                                1 Reply Last reply Reply Quote 1
                                • First post
                                  Last post