XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    VMs are abruptly getting shutdown

    Scheduled Pinned Locked Moved XCP-ng
    14 Posts 3 Posters 874 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      lritinfra
      last edited by

      We have faced multiple occurrences of server failures while using xcp-ng.
      We are using version 8.2.0 And ISO is booted on HPE physical servers.the device shuts down unexpectedly causing the Virtual machines in the server to crash and leading to a downtime of our running application.We tried to analyze the log files of the bay and the VM both but could not find any such result that could prove why the bay was shut down.Please guide step by step how to analyze & resolve the issue

      1 Reply Last reply Reply Quote 0
      • DanpD Offline
        Danp Pro Support Team
        last edited by

        This appears to be the same issue that you posted about back in January. Why haven't these hosts been updated to 8.2.1 and fully patched? 🤔

        1 Reply Last reply Reply Quote 0
        • L Offline
          lritinfra
          last edited by lritinfra

          will the issue be resolved after the upgrade???How can we determine the reason for virtual machines shutting down automatically?

          1 Reply Last reply Reply Quote 0
          • DanpD Offline
            Danp Pro Support Team
            last edited by

            will the issue be resolved after the upgrade???

            Maybe... but you won't know until you try it.

            How can we determine the reason for virtual machines shutting down automatically?

            If your hosts are failing, then that would explain why the VMs are shutting down.

            • Which model of HPE servers are you running?
            • What version of the BIOS is currently installed?
            • What brand of NICs are installed?
            L 1 Reply Last reply Reply Quote 0
            • L Offline
              lritinfra @Danp
              last edited by lritinfra

              @Danp
              1.We utilize the HPE ProLiant BL460c Gen10 server model.
              2.Our BIOS version is 2.72_09-29-2022.
              3.We have deployed network interface cards (NICs) manufactured by Emulex Corporation.

              DanpD 1 Reply Last reply Reply Quote 0
              • DanpD Offline
                Danp Pro Support Team @lritinfra
                last edited by

                @lritinfra said in VMs are abruptly getting shutdown:

                Our BIOS version is 2.72_09-29-2022.

                Once again, you are not current on patching your systems as there have been 6 new BIOS releases since that one. 😦

                L 1 Reply Last reply Reply Quote 0
                • L Offline
                  lritinfra @Danp
                  last edited by lritinfra

                  @Danp We're utilizing a total of 16 bays with same bios & same xcpng iso, but why did this issue occur specifically in only two or three bays of production?

                  1 Reply Last reply Reply Quote 0
                  • DanpD Offline
                    Danp Pro Support Team
                    last edited by

                    Faulty memory / hardware? 🤷

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      john.c
                      last edited by john.c

                      @lritinfra Are there any entries in the logs on the HPE iLO as its health monitoring may give you some clues?

                      Depending on maintenance for those problematic servers is it possible to run Intelligent Provisioning then have it perform the in depth tests of Insight Diagnostics tools?

                      The Insight Diagnostics tools will test all parts of the system hardware including, drives, memory, storage etc. Letting you know about any parts which fail these tests.

                      As well as more thoroughly than the non-in depth tests so is more likely to ferret out any hardware issues, as long as its up to date so it can notice any issues if and when firmware on hardware is tested.

                      L 1 Reply Last reply Reply Quote 1
                      • L Offline
                        lritinfra @john.c
                        last edited by

                        @john-c We've reviewed both the vm and Bay logs and found no records related to the shutdown. Currently, Intelligent Provisioning is disabled in our system, and we're unable to enable it as we're currently in production.

                        J 1 Reply Last reply Reply Quote 0
                        • J Offline
                          john.c @lritinfra
                          last edited by john.c

                          @lritinfra said in VMs are abruptly getting shutdown:

                          @john-c We've reviewed both the vm and Bay logs and found no records related to the shutdown. Currently, Intelligent Provisioning is disabled in our system, and we're unable to enable it as we're currently in production.

                          Unfortunately HPE Intelligent Provisioning is the most reliable way to run the hardware diagnostics. As the online version of Insight Diagnostics are only available on Windows or Linux. Though XCP-ng is Linux based, its not a good idea to install and run the Linux version, due this being a custom instance of Linux dedicated to being a hypervisor host.

                          HPE Insight Diagnostics Online also needs direct access to the hardware in order to work, so can't be in a VM.

                          As the software package can likely lead to a broken instance of XCP-ng, that's if the software is even compatible enough to be able to run in a reliable manner.

                          Is there any policies or processes that can be carried out temporarily to run the tests?

                          As with this repeated abrupt shutdown of the VMs, can't be doing them any good and the servers themselves. Because crashes at the wrong moment, can really do a number to data. One such wrong moment is if the VM or app running on it is writing data, as the event leads to an interruption to the writing action, thus leaving the file incomplete due to invalid data causing corruption.

                          1 Reply Last reply Reply Quote 0
                          • J Offline
                            john.c
                            last edited by john.c

                            @lritinfra Something to consider also the HPE Intelligent Provisioning is the main way, outside of HPE iLO, HPE SUM or HPE SPP to update the server's hardware firmware. If you aren't using individual RPMs or SCEXE files for the task. With HPE Intelligent Provisioning and HPE SPP being able to update, both firmware and BIOS.

                            As not all of the updates for firmware will be in a compatible format, for use with HPE iLO. I'm not sure if it has changed but an Administrator Password set on the BIOS (at minimum), also locks out (disables) access to the Erase option on the HPE Intelligent Provisioning. At least it does on my only HPE Server running an up to date BIOS, HPE iLO and HPE Intelligent Provisioning.

                            Thus disabled HPE Intelligent Provisioning doesn't help with being up to date enough to fix vulnerabilities and bugs at hardware or firmware level.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post