NUMA-impact - Xeon/Epyc - 1P vs 2P
-
There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.
After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.
-
I'd say that the EPYCs would be better than the Xeons no matter the NUMA per CCX setting. But, I would suggest looking at the EPYC genoa, which is 2-3x faster for the same cost as the previous EPYCs! Absolutely amazing performance and value.
https://www.phoronix.com/review/amd-epyc-9654-9554-benchmarks
https://www.phoronix.com/review/amd-epyc-9374fXen supports NUMA scheduling. The issue, I think, is whether the VM fits within a NUMA node or not, and if the application and/or guest VM understands NUMA in a guest environment. Only way to know is to properly benchmark the specific application with NUMA per CCX on or not and the number of cores you think the VM will need.
-
@Forza
Thank you for your answer. If I take the "EPYC-path": Did you ever see, that a 2P-system is slower, than its 1P-pendant? -
@KPS I have no experience of a 2P system so far, so I cannot say : (
-
@olivierlambert said in NUMA-impact - Xeon/Epyc - 1P vs 2P:
There is no universal answer (because it's mostly depending on your VM load and what do you expect). As usual, my advice is to keep it simple if you don't have a problem with it (ie: you are satisfied by the perf.). Even a default EPYC configuration will be likely always better than a Xeon one.
After that, if you want to go deeper and learn the details, it's OK, let me just ping @tjkreidl who did a remarkable job (if I remember correctly) on this very topic.
Thanks for the mention, @olivierlambert ! Here's a link to part 3, which contains links back to parts 1 and 2. Note that NUMA will affect EPYC processors differently as they changed the die configuration at one point with the number of cores. I'm open for any questions on this topic. https://blogs.mycugc.org/2019/04/30/a-tale-of-two-servers-part-3-the-influence-of-numa-cpus-and-sockets-cores-persocket-plus-other-vm-settings-on-apps-and-gpu-performance/
-
Ah yes, that was exactly this great article I had in mind!
-
@tjkreidl
Hi Tobias! Nice to see your answer. We had a call about 10 years ago about XenserverThank you for your analysis. This topic seems to be much more complicated, than I hoped it is. In your tests, adding a second socket did never lead to a lower performance, than the 1P system.
In theory, your 8-vCPU-test should be faster, if it does not need to access e.g. memory of the second CPU, but in real life, this seems to be not so relevant...What would be your "what to buy"-recommendation, today?
-
Some more resources:
https://developer.amd.com/wp-content/resources/56827-1-0.pdf chapter 2.5 NUMA and CCX/CCD
-
@KPS said in NUMA-impact - Xeon/Epyc - 1P vs 2P:
@tjkreidl
Hi Tobias! Nice to see your answer. We had a call about 10 years ago about XenserverThank you for your analysis. This topic seems to be much more complicated, than I hoped it is. In your tests, adding a second socket did never lead to a lower performance, than the 1P system.
In theory, your 8-vCPU-test should be faster, if it does not need to access e.g. memory of the second CPU, but in real life, this seems to be not so relevant...What would be your "what to buy"-recommendation, today?
Hey, @KPS! Nice to hear from you and, yes, it's a pretty complex interactions of pieces that makes tuning so hard. There are whole books on tuning I've see, some going way back to Digital Equipment Corporation VAX machines.
As to recommendations, especially if you have a lot of external storage I/O, I'd opt for CPUs with no less than 3.0 GHz clock speeds and a fair amount of internal cache, as loads are also going to be potentially bottle-necked there. As to CPU-cause NUMA, as my tests sow, it can vary how much this effect is or not. Note also, as mentioned on one of th eartivles, that the order you start of VMs can make a big difference; those that are more affected by NUMA should be launched first to better ensure they get contained on one of the physiccal CPU modules and its associated memory banks.
Generally, each system is unique enough that it may entail a lot of experimentation to find the best settings. And don't forget to check your BIOS settings, as well, to see how they are configured. Hyperthreading is quite a controversial topic, as well, and I'd just put in my $0.02' worth to say that for us, it helped a lot since our CPUs were over-provisioned by something like a factor of six since we were running a lot of XenDesktop VMs.
In short, get the fastest processors and memory you can afford! -
Also something to keep in mind: It's not only about NUMA (which is different since 2nd Epyc gen, as they have all memory channels on an IO-Die and only split the caches now), it's also about memory bandwith!
So it adds more complexity and depends on the needs of your workload.
If it benefits from high memory bandwith, a 2nd socket doubles it (technically)!