XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    NUMA-impact - Xeon/Epyc - 1P vs 2P

    Scheduled Pinned Locked Moved Compute
    11 Posts 5 Posters 2.1k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO
      last edited by

      There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.

      After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.

      tjkreidlT 1 Reply Last reply Reply Quote 0
      • ForzaF Offline
        Forza @KPS
        last edited by

        @KPS

        I'd say that the EPYCs would be better than the Xeons no matter the NUMA per CCX setting. But, I would suggest looking at the EPYC genoa, which is 2-3x faster for the same cost as the previous EPYCs! Absolutely amazing performance and value.

        https://www.phoronix.com/review/amd-epyc-9654-9554-benchmarks
        https://www.phoronix.com/review/amd-epyc-9374f

        43170012-3df4-4643-acbf-8aff4d6ba69e-image.png

        Xen supports NUMA scheduling. The issue, I think, is whether the VM fits within a NUMA node or not, and if the application and/or guest VM understands NUMA in a guest environment. Only way to know is to properly benchmark the specific application with NUMA per CCX on or not and the number of cores you think the VM will need.

        K 1 Reply Last reply Reply Quote 0
        • K Offline
          KPS Top contributor @Forza
          last edited by

          @Forza
          Thank you for your answer. If I take the "EPYC-path": Did you ever see, that a 2P-system is slower, than its 1P-pendant?

          ForzaF 1 Reply Last reply Reply Quote 1
          • ForzaF Offline
            Forza @KPS
            last edited by

            @KPS I have no experience of a 2P system so far, so I cannot say : (

            1 Reply Last reply Reply Quote 0
            • tjkreidlT Offline
              tjkreidl Ambassador @olivierlambert
              last edited by tjkreidl

              @olivierlambert said in NUMA-impact - Xeon/Epyc - 1P vs 2P:

              There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.

              After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.

              Thanks for the mention, @olivierlambert ! Here's a link to part 3, which contains links back to parts 1 and 2. Note that NUMA will affect EPYC processors differently as they changed the die configuration at one point with the number of cores. I'm open for any questions on this topic. 🙂 https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/

              K 1 Reply Last reply Reply Quote 2
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                Ah yes, that was exactly this great article I had in mind!

                ForzaF 1 Reply Last reply Reply Quote 2
                • K Offline
                  KPS Top contributor @tjkreidl
                  last edited by

                  @tjkreidl
                  Hi Tobias! Nice to see your answer. We had a call about 10 years ago about Xenserver 🙂

                  Thank you for your analysis. This topic seems to be much more complicated, than I hoped it is. In your tests, adding a second socket did never lead to a lower performance, than the 1P system.
                  In theory, your 8-vCPU-test should be faster, if it does not need to access e.g. memory of the second CPU, but in real life, this seems to be not so relevant...

                  What would be your "what to buy"-recommendation, today?

                  tjkreidlT 1 Reply Last reply Reply Quote 0
                  • ForzaF Offline
                    Forza @olivierlambert
                    last edited by

                    Some more resources:

                    https://www.micron.com/about/blog/2020/february/numa-configuration-on-amd-rome-processors-and-nvme-performance-on-windows-servers

                    https://developer.amd.com/wp-content/resources/56827-1-0.pdf chapter 2.5 NUMA and CCX/CCD

                    1 Reply Last reply Reply Quote 0
                    • tjkreidlT Offline
                      tjkreidl Ambassador @KPS
                      last edited by Danp

                      @KPS said in NUMA-impact - Xeon/Epyc - 1P vs 2P:

                      @tjkreidl
                      Hi Tobias! Nice to see your answer. We had a call about 10 years ago about Xenserver 🙂

                      Thank you for your analysis. This topic seems to be much more complicated, than I hoped it is. In your tests, adding a second socket did never lead to a lower performance, than the 1P system.
                      In theory, your 8-vCPU-test should be faster, if it does not need to access e.g. memory of the second CPU, but in real life, this seems to be not so relevant...

                      What would be your "what to buy"-recommendation, today?

                      Hey, @KPS! Nice to hear from you and, yes, it's a pretty complex interactions of pieces that makes tuning so hard. There are whole books on tuning I've see, some going way back to Digital Equipment Corporation VAX machines.
                      As to recommendations, especially if you have a lot of external storage I/O, I'd opt for CPUs with no less than 3.0 GHz clock speeds and a fair amount of internal cache, as loads are also going to be potentially bottle-necked there. As to CPU-cause NUMA, as my tests sow, it can vary how much this effect is or not. Note also, as mentioned on one of th eartivles, that the order you start of VMs can make a big difference; those that are more affected by NUMA should be launched first to better ensure they get contained on one of the physiccal CPU modules and its associated memory banks.
                      Generally, each system is unique enough that it may entail a lot of experimentation to find the best settings. And don't forget to check your BIOS settings, as well, to see how they are configured. Hyperthreading is quite a controversial topic, as well, and I'd just put in my $0.02' worth to say that for us, it helped a lot since our CPUs were over-provisioned by something like a factor of six since we were running a lot of XenDesktop VMs.
                      In short, get the fastest processors and memory you can afford! 🙂

                      1 Reply Last reply Reply Quote 0
                      • C Offline
                        cg
                        last edited by

                        Also something to keep in mind: It's not only about NUMA (which is different since 2nd Epyc gen, as they have all memory channels on an IO-Die and only split the caches now), it's also about memory bandwith!

                        So it adds more complexity and depends on the needs of your workload.
                        If it benefits from high memory bandwith, a 2nd socket doubles it (technically)!

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post