XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Windows 2022 VM - Reboot triggered - VM shuts down

    Scheduled Pinned Locked Moved Compute
    18 Posts 5 Posters 1.5k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DarkbeldinD Offline
      Darkbeldin Vates 🪐 Pro Support Team @KPS
      last edited by

      @KPS I suppose you are using Citrix drivers? With management agents? Dynamic memory?

      K 1 Reply Last reply Reply Quote 0
      • K Offline
        KPS Top contributor @Darkbeldin
        last edited by

        @Darkbeldin said in Windows 2022 VM - Reboot triggered - VM shuts down:

        @KPS I suppose you are using Citrix drivers? With management agents? Dynamic memory?

        Citrix Agent with management agent, but with static memory...

        DarkbeldinD 1 Reply Last reply Reply Quote 0
        • DarkbeldinD Offline
          Darkbeldin Vates 🪐 Pro Support Team @KPS
          last edited by

          @KPS So quit normal setup, not sure what can be done, does it do this every time or just this once?

          K 1 Reply Last reply Reply Quote 0
          • K Offline
            KPS Top contributor @Darkbeldin
            last edited by KPS

            The problem did happen one more time. The server does a daily restart, but last night it just stopped. Same behaviour as last time:

            • Task scheduler starts "C:\Windows\System32\shutdown.exe -r -t 120 -f"
            • System starts to shut down and Eventlog just stops logging
            • XCP-ng shows, VM is off

            The last event prior to the "manual" boot is:

            Event 7036
            Service"VSS Writer for the internal windows database" is now in state "stopped" (translated)

            I see, that there is a dump-file written, but I do not really know, how to analyze it.

            Do you have any idea on how to solve this?

            Best wishes

            K 1 Reply Last reply Reply Quote 0
            • K Offline
              KPS Top contributor @KPS
              last edited by

              The analysis of the dump did show up:

              UNEXPECTED_KERNEL_MODE_TRAP (7f) / EXCEPTION_DOUBLE_FAULT

              According to Microsoft:

              => A double fault, which is a fault that occurred while processing an earlier fault, which always results in a system failure.

              => Bug check 0x7F typically occurs after you install faulty or mismatched hardware, especially memory, or if installed hardware fails.

              => Check the availability of updates for the ACPI/BIOS, the hard driver controller, or network cards from the hardware manufacturer.

              https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x7f--unexpected-kernel-mode-trap

              Did you ever see that behaviour?

              K 1 Reply Last reply Reply Quote 0
              • K Offline
                KPS Top contributor @KPS
                last edited by

                Hi!

                Today, it happened again. Same behaviour. Nothing special in the logs, but VM is shut down.

                Any ideas on how to solve this?

                K 1 Reply Last reply Reply Quote 0
                • K Offline
                  KPS Top contributor @KPS
                  last edited by

                  Hi!

                  It happened one more time. This time on a node with AMD-CPU and without memory dump. Another Windows 2022 VM...

                  Did you ever expect this?

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    I would try different combinations in case:

                    1. create a new empty VM, attach the disk from the "old" one to the new one, check if it's the same behavior
                    2. remove Citrix tools, install XCP-ng tools and see if it continues to have the problem
                    K 1 Reply Last reply Reply Quote 0
                    • K Offline
                      KPS Top contributor @olivierlambert
                      last edited by

                      I am quite frustrated in trying to "force" the behaviour...

                      I installed 4 Windows 2022 Test-VMs:

                      • BIOS + XCP-Tools
                      • BIOS + XenTools
                      • UEFI + XCP-Tools
                      • UEFI + XenTools

                      The 4 VMs are all rebooting every 6 Minutes for the last 20h, but non on them did get "shut down", while my production VMs, that are rebooting every 24h are affected. Currently one (changing) of the 6 Windows 2022 VMs is shutdown once a month.

                      I have no more idea about the "raise condition".

                      K 1 Reply Last reply Reply Quote 0
                      • K Offline
                        KPS Top contributor @KPS
                        last edited by KPS

                        Hi!

                        I thought, I did "understand the issue, now, but I am having another one...

                        About the first one:
                        The problem seemed to end, if the MEMORY.DMP-file was deleted. Only the "second dump" did trigger the issue.

                        But:

                        Last night, i had another strange issue on a Windows 2022 VM with Citrix Tool 9.3.1.
                        The system did a scheduled reboot but after that, it was hanging on the UEFI boot manager:

                        Error: 0xc0000225
                        A Required Device Isn't Connected or Can't Be Accessed

                        The Eventlog did show a clean shutdown without errors. The only strange thing is one event: "The system has rebooted without cleanly shutting down first."

                        I started the OS-selection and the system did boot without issues.

                        I am not sure, of this is XCP-ng-related, but this is worse, than the first issue, as I cannot "solve" it by a script, that is checking the status of the VM.

                        I have never seen this before.

                        What do you think?

                        Thank you for your help!

                        T 1 Reply Last reply Reply Quote 0
                        • T Offline
                          tuxen Top contributor @KPS
                          last edited by

                          @KPS When that force reboot command is issued, the VM:

                          1. Is under intensive I/O?
                          2. Has a backup job started/running?
                          K 1 Reply Last reply Reply Quote 0
                          • K Offline
                            KPS Top contributor @tuxen
                            last edited by

                            @tuxen
                            The reboots are scheduled in "out-of-office-hours", but some hours before the next backup-job.

                            So, there is nearly zero load.
                            The hosts are beefy AMD Genoa-systems, that are not really in use. The problem did already happen, when only ONE VM is on an AMD 9374F.

                            ...it did just happen 15 minutes ago. Dump was written and it happened although the dump before was deleted. That single VM did have the problem 1 month ago for the last time (daily reboot).

                            T 1 Reply Last reply Reply Quote 0
                            • T Offline
                              tuxen Top contributor @KPS
                              last edited by

                              @KPS I was exactly thinking about an after-hour task doing heavy storage I/O (e.g data replication or ETL-like workloads). Under this scenario, a forced reboot might cause some sort of file system corruption due to uncommitted data being lost.

                              Now, other source of issue comes to mind: automatic Windows Update. Is this service active? I'm not a Windows expert but a forced reboot during a system update might also cause an unexpected behavior.

                              Seeing all those errors, it seems that some system file or DLL got corrupted, needing a repair. It's strongly recommended taking a snapshot before running a system repair.

                              K 1 Reply Last reply Reply Quote 0
                              • K Offline
                                KPS Top contributor @tuxen
                                last edited by

                                @tuxen said in Windows 2022 VM - Reboot triggered - VM shuts down:

                                Now, other source of issue comes to mind: automatic Windows Update. Is this service active?

                                Thank you for your answer, but both is not the case. Windows Updates are only triggered manually. There is VERY low I/O and CPU-load, when the reboot is triggered.

                                T 1 Reply Last reply Reply Quote 0
                                • T Offline
                                  tuxen Top contributor @KPS
                                  last edited by

                                  @KPS one thing is clear to me. The reboot is triggering a VM shutdown due to a system crash (kernel errors and memory dump files being a lead). Without a detailed stack trace (like Linux's kernel panic) and the difficulty in reproducing the issue, troubleshooting is a very hard task. One last thing I'd check is the /var/log/daemon.log at the VM shutdown time window.

                                  K 1 Reply Last reply Reply Quote 0
                                  • C Offline
                                    chrisfonte
                                    last edited by

                                    Is the instance of Windows licensed?

                                    1 Reply Last reply Reply Quote 0
                                    • K Offline
                                      KPS Top contributor @tuxen
                                      last edited by

                                      @tuxen
                                      I did send the MEMORY.DMP-files to a microsoft specialist and did post the result on Juli 25 in that thread.
                                      daemon.log is quite hard for me to read. I did not find something, I can see as an error. It looks, like a "shutdown" - not a reboot.

                                      @chrisfonte
                                      Yes, fully licensed. It is not a "shutdown because of missing licenses after 24h".

                                      1 Reply Last reply Reply Quote 1
                                      • First post
                                        Last post