XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    New Xcp-Ng server Run-Away

    Scheduled Pinned Locked Moved Compute
    performance
    8 Posts 5 Posters 1.3k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • kulmacetK Offline
      kulmacet
      last edited by

      Greetings, I am new to Xen-Ng but not to virtualization. I have an issues that I hope you can help with:

      I recently bought an HP DL-160 Gen8 in the following configuration for a home lab.

      Xcp-Ng: 8.2.0
      HP DL-160 Gen8
      Memory: 64GB
      Processors: 32
      Storage: 1TB

      and migrated a server to this host machine. After a bit this migrated machine host and guest started to run erratic and stalled Xcp-Ng server and my internal LAN which was disappointing.

      Then I created a VM on the server with the following configuration:

      Oracle Linux 8
      4 Processors
      8GB RAM

      This machine is not busy and was a very simple install.

      Installation and configuration of the server went without issue and as expected.

      However, I have a problem. After the guest has run for (10 minutes, 1 hour, 1 day, Who knows how long) the processors on this guest including the RAM shoots through the roof and the guest server starts to scream like a jet about to take off for no apparent reason.

      I'm not sure how to resolve this or how to trouble shoot except to turn off the host and start this process over all over again.

      Any support appreciated,
      And Many Thanks

      A TheNorthernLightT 2 Replies Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Anything in dmesg or xl dmesg? Also, if there's anything weird, check your guest logs. Might be a problem in there too.

        And the usual suspect: hardware. Check you are in latest version for BIOS, firmware, memory check etc.

        1 Reply Last reply Reply Quote 1
        • A Online
          Andrew Top contributor @kulmacet
          last edited by

          @kulmacet First make sure the BIOS and iLO are up to date (or for a G8, the latest/last version). There are known issues with some older versions.

          Check iLO and the IML to see if there are hardware errors listed. Check the BIOS settings. Defaults are NOT always the best choice. Normal fans suddenly running fast points to very high heat/usage or hardware issues. HP likes to spin the fans fast when the server has hardware issues. Using 4 cores on a 16 core machine should not cause high load. Disable HT, at least as a test. XCP/Xen does not like HT for some older CPUs. Check IPMI SEL for additional hardware issues (from ipmitool in dom0). You can also run the HP diags and other system tests to see if it catches any issues.

          I have many DL360p G8 systems and they work well. The DL160 is a cheaper hardware design and known to have occasional hardware problems but XCP should work if the system is healthy.

          The IML will log hardware errors that the system (dmesg) won't see.

          1 Reply Last reply Reply Quote 1
          • kulmacetK Offline
            kulmacet
            last edited by

            Everything in the hardware and BIOS all look correct but still get the run away server. Even the logs do not report any issues. It's just not running correctly and am surprised at the lack of logging.
            Still not correct.

            fohdeeshaF 1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by

              This is really weird, I don't remember seeing this behavior in the past, like ever :thinking:

              @fohdeesha any idea?

              1 Reply Last reply Reply Quote 0
              • TheNorthernLightT Offline
                TheNorthernLight @kulmacet
                last edited by

                @kulmacet I've had something like this, and it turned out to be a dying BMC controller in the end.

                I had the server motherboard replaced under warranty, and poof, problem stopped (same CPUs, RAM, etc).

                1 Reply Last reply Reply Quote 0
                • kulmacetK Offline
                  kulmacet
                  last edited by

                  Maybe I'm looking at this wrong... It's feature!
                  25265155-0ba5-4d64-8463-5c56b8bcc46d-image.png

                  1 Reply Last reply Reply Quote 0
                  • fohdeeshaF Offline
                    fohdeesha Vates 🪐 Pro Support Team @kulmacet
                    last edited by

                    @kulmacet Can you recreate this with other VMs, or just this specific oracle linux VM? I would spin up a new debian VM for example and shut this problematic VM off, and see if the issue happens with this VM as well. Outside of that, it's really looking like a hardware issue. Also, double check the ILO and BIOS firmware are at the latest. I can almost guarantee it shipped with ancient versions, and many issues like this have been patched relatively recently.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post