XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Epyc Boost... not boosting?

    Scheduled Pinned Locked Moved Compute
    10 Posts 4 Posters 3.3k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • tekwendellT Offline
      tekwendell
      last edited by olivierlambert

      So I have an 1-socket Epyc 7713 and I am not sure it is boosting properly.

      I setup a fresh install using the download link day before yesterday and then applied the outstanding updates. I am used to working on a Linux host and the powermanagement stuff there.

      Here is what I am seeing:

      [20:17 xcp-7713 ~]# xenpm get-cpufreq-states 0
      cpu id               : 0
      total P-states       : 3
      usable P-states      : 3
      current frequency    : 2000 MHz
      *P0        [2000 MHz]: transition [                   3]
                             residency  [                 423 ms]
      P1         [1700 MHz]: transition [                   0]
                             residency  [                   0 ms]
      P2         [1500 MHz]: transition [                   2]
                             residency  [                  50 ms]
      

      I tried:

      xenpm enable-turbo-mode 0
      

      and

      xenpm enable-turbo-mode 0-127
      

      (xenpm xenpm enable-turbo-mode with no arguments makes a lot of errors on the CLI about cpus I don't have-- cpus 128-255...)

      I also set the perf governor from ondemand to performance but when I cat /proc/cpuinfo it always says 2000mhz and in virtual machines they never report a clock speed higher than 2.00ghz.

      The 7713 has a base clock of 2.00ghz and a max boost clock of 3.625ghz.

      I created a windows vm, assigned 8 cores and ran some tests. It seems to be stuck at 2ghz when it should be boosting around 3.0ghz even with a lot going on in the background. About the only thing that runs theses epyc CPUs close to their base clocks are really heavy workloads.

      I checked bios and everything looks good there. Booting from an ubuntu live cd and setting the performance governor/boost/power profile all works as expected.

      I am not sure what I could do next that would be useful? Happy to poke it with a sharpened stick if you can point me the right way. thanks

      tjkreidlT fohdeeshaF 2 Replies Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Hi,

        What is the machine brand? You should enable perf mode in your BIOS.

        tekwendellT 1 Reply Last reply Reply Quote 0
        • tekwendellT Offline
          tekwendell @olivierlambert
          last edited by tekwendell

          @olivierlambert it's a tyan s8030

          Performance is set in bios under smu options, but the xen utils still report an essentially fixed clock speed of 2ghz.

          Are the tools expected to report the boost clocks properly? I am not used to /proc/cpuinfo not being accurate to the moment if so

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Any idea @fohdeesha ?

            tekwendellT 1 Reply Last reply Reply Quote 0
            • tekwendellT Offline
              tekwendell @olivierlambert
              last edited by

              I could run some more commands for more diagnostics if that helps?

              tekwendellT 1 Reply Last reply Reply Quote 0
              • tekwendellT Offline
                tekwendell @tekwendell
                last edited by

                I installed windows server 2019 and ran cpu-z. It scores about 475 for single thread.

                I re-installed windows server 2019 on bare metal and it scores about ~501 +/- single thread.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Hmm is it a big gap? I would have expect more 🤔

                  Anyway, @fohdeesha will give us some leads 🙂

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    @andyhhp any idea about this? I'm surprised about that claim, to me the CPU should clock higher when needed 🤔

                    Adding also @andSmv in the loop.

                    @fohdeesha can you double check the behavior on our machines?

                    1 Reply Last reply Reply Quote 0
                    • tjkreidlT Offline
                      tjkreidl Ambassador @tekwendell
                      last edited by

                      @tekwendell Do you also have C states enabled in your BIOS settings? Maybe these articles might help:

                      https://blogs.mycugc.org/2019/03/07/a-tale-of-two-servers-how-bios-settings-can-affect-your-apps-and-gpu-performance/

                      https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-2-how-not-only-bios-settings-but-also-gpu-settings-can-affect-your-apps-and-gpu-performance/

                      It certainly sounds like a BIOS setting is responsible. Also, turbo mode will not always turn on unless the load is pretty high, from what I hav seen.

                      1 Reply Last reply Reply Quote 0
                      • fohdeeshaF Offline
                        fohdeesha Vates 🪐 Pro Support Team @tekwendell
                        last edited by

                        @tekwendell xen carefully manages CPU power management to match VM load and vCPU count, I would not manually try to adjust things with xenpm in the meantime as it's likely you'll make things worse (don't try to outsmart xen power management unless you have a VERY specific use case). Xen is designed for paralleled workloads (more than a single VM), so there's many tunables for VMs that are set with this in mind (like CPU affinity). So by default I'm sure the CPU affinity for your single windows VM is still set somewhere in the "middle", so it's not going to be allowed to schedule the full CPU time versus what dom0 is also using.

                        I'm not an expert in AMD/Epyc power management, but I believe it's pretty typical that CPU power/clock management boosts based on overall CPU load, and running a benchmark on only a single VM using something like 8 cores on a 64 core processor is not going to demand a lot CPU time, so I'm not surprised to see it's not boosting very far. Spin up 6 more of those VMs and benchmark them all at the same time, I wouldn't be surprised if you see it start boosting higher

                        475 cpu-z versus 501 bare metal is very good and indicates pretty clearly there's no issue here, you're getting 94% bare metal performance on windows under a large virtualization stack (historically the OS with the most overhead to virtualize). I would be very happy about this

                        If you really want to dig further, ensure your bios power management is set to "OS-controlled", this will hand more control over turbo and c-states to the xen power manager and is what is recommended on AMD processors, and then you can use some commands listed here to check actual turbo status. But again, note that I won't be surprised if you can't get a 64-core processor to enter its highest turbo states when only stressing 1/10th of its cores: https://support.citrix.com/article/CTX200390/power-settings-in-citrix-hypervisor-cstates-turbo-and-cpu-frequency-scaling

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post