XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    High CPU temperatures on multiple XCP-ng hosts (identical hardware, low load)

    Scheduled Pinned Locked Moved Hardware
    7 Posts 3 Posters 221 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      sotero
      last edited by

      We're running a small cluster of physical servers on XCP-ng 8.3, all with identical hardware:
      Model: Lenovo ThinkSystem ST250 V2
      CPU: Intel Xeon E-2356G
      Each host runs only 2 VMs: one small VM for XOA, and one production VM with very low CPU usage (below 20%).
      The servers are reporting consistently high CPU temperatures (~82–83 °C) even when idle. And are distributed across different locations; they are not in the same facility. These are systems deployed at different client sites.

      What we've checked
      Very low CPU load on all hosts (load average ~0.1).
      No heavy processes in top or htop.
      All report similar ambient temperature (~22 °C via IPMI sensor).

      ipmitool sensor data:
      Metric Affected servers
      CPU Temp 82–83 °C
      CPU Power 80 W
      Sys Power 140–160 W
      Fan 2 RPM ~950–1275 RPM

      All servers are on the same versions:
      BIOS: TQE112D-3.10
      BMC Firmware: 3.10

      What we're looking for
      Has anyone seen similar CPU thermal behavior on XCP-ng with Intel CPUs?
      Can dynamic frequency scaling be enabled on XCP-ng reliably?
      Should we be passing boot parameters like intel_pstate=enable or loading specific modules?

      D 1 Reply Last reply Reply Quote 0
      • D Offline
        DustinB @sotero
        last edited by

        @sotero This may seem odd, but are the CPU fans actually working? I know there are hundreds of people who are using Intel CPUs (likely the same exact model) that aren't experiencing the same issue.

        My initial thought is that these servers you have, has some kind of hardware or firmware issue where the CPU Fan simply isn't working.

        S 1 Reply Last reply Reply Quote 0
        • S Offline
          sotero @DustinB
          last edited by

          @DustinB Hi,

          I believe the fans are working correctly in all cases:
          6c1ac5cf-4bf3-409c-aaeb-ff82c0f21f97-imagen.png

          15ec91f2-ebbb-4a85-b6f8-5d8a51bd7058-imagen.png

          084528a1-eac1-4336-8d4e-d53dece0080d-imagen.png

          Do you know if any firmware issue or a compatibility problem between Lenovo hardware and XCP-ng could explain this?

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            DustinB @sotero
            last edited by

            @sotero I'm not aware of any particular issue off hand. It seems like the hardware is operating as expected...

            Were all of these units shipped from the manufacturer "ready to go" aside from having XCP-ng installed?

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              sotero @DustinB
              last edited by

              @DustinB Thanks for your reply!

              Yes, all of these Lenovo ThinkSystem ST250 V2 servers were delivered preassembled from the manufacturer with identical hardware configurations. The only customization done was the installation of XCP-ng 8.3, and the deployment of the same two VMs on each host.
              We didn't modify BIOS settings manually after delivery. The worrying thing here is that we do not have this problem on the same Lenovo ThinkSystem ST250 V2 hardware where we have installed Windows server.

              If you have any suggestions on how to verify or normalize power management behavior across identical units, that would be really helpful!

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                sotero @sotero
                last edited by

                We are still investigating the high CPU temperatures we're seeing on some of our XCP-ng hosts.

                Worth noting: the affected hosts are only running two VMs:

                A small XOA appliance (2 vCPU, 2 GiB RAM)

                A single Ubuntu VM acting as an application server (10 vCPU, 24 GiB RAM), running just one business application.

                Here is the output from lscpu on one of these systems:

                less
                Copiar
                Editar
                Architecture: x86_64
                CPU op-mode(s): 32-bit, 64-bit
                Byte Order: Little Endian
                CPU(s): 8
                Thread(s) per core: 8
                Core(s) per socket: 1
                Socket(s): 1
                Model name: Intel(R) Xeon(R) E-2356G CPU @ 3.20GHz
                We would appreciate any feedback regarding recommended CPU configurations for this type of setup under XCP-ng. Could the current virtual CPU allocation be contributing to thermal issues?

                Thanks in advance!

                T 1 Reply Last reply Reply Quote 0
                • T Offline
                  tuxen Top contributor @sotero
                  last edited by

                  @sotero could you post the output of:

                  # Collect xen info
                  xl info
                  
                  # Collect C-States info
                  xenpm start 3
                  

                  ?

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post