Recommended CPU Scheduler / Topology ?

NielsH

Hello!

Looking for some advice on the CPU settings on the VM Hypervisor and the VMs themselfs, and whether it is recommended to fine-tune this for extra performance or if the defaults are OK.

In XOA we have the option on the hypervisor level called "Scheduler Granularity" that can be set to CPU, Socket, or Socket. However I cannot find any documentation on what each option means and how it affects performance. All I can find is here: https://xcp-ng.org/docs/release-8-2.html#core-scheduling-experimental but it does not explain how each option affects performance/security. The default is CPU, fwiw.
For each VM we can also setup the topology. This has always seemed like such a weird default to me; if I have a VM with 4 CPUs it will have the topology as 4 sockets with 1 core per socket. But there's also the option to have 2 Sockets and 2 cores/socket or 1 socket and 4cores/socket.
Since the physical topology is static of course (the motherboard has 2 CPU sockets), why does it make a virtual socket for each core we allocate? Does this give extra performance? Or should we change this to align with the physical layout?

Fwiw, we are using AMD EPYC 7F52 CPUs. The motherboard has 2 physical sockets. That means we have 32 CPU cores total but since SMT is enabled we see 64 CPU threads on the hypervisor.
We are only using Debian-based VMs.

I'm also curious to know how the CPU topology affects things like L3 CPU cache; the CPUs we are using has quite a lot of L3 cache, so I would guess it is good for performance if the VM is scheduled on the same CPU every time to ensure the cache is not lost. Is this something that is done automatically or is the cache useless in a VM hypervisor scenario if VMs are scheduled "randomly" on the CPUs?

Perhaps we are overthinking it and should just leave everything on default but would like to ask here to be certain.

Thanks!
Niels

olivierlambert

Hi!

You are asking interesting questions, but I really think the impact is not that huge. It's likely more a NUMA topic, probably something for @dthenot our PhD student also working at Vates on NUMA related perf questions

dthenot

Hello

So, you can of course makes some config by hand to alleviate some of the cost of the architecture on virtualization.
But like you can imagine, the scheduler will move the vCPU around and sometimes break the L3 locality if it move it to a remote core.
I asked to someone more informed than me about that and he said that running a vCPU is always better than trying to make it run locally so it's only useful under specific condition (having enough resources).

You can use the cpupool functionality to isolate VM on a specific NUMA node.
But it's only interesting if you really want more performance since it's a manual process, and can be cumbersome.

You can also pin vCPU on a specific physical core to keep L3 locality, but it would only work if you have little amount of VM running on that particular core. So yes, it might be a little gain (or even a loss).

There is multiple ways to make the core pinned, most with xl but if you want it to stick between VM reboot you need to use xe. Especially since if you want to pin a VM to a node and need it's memory being allocated on that node, since it can only be done at boot time. Pinning vCPU after boot using xl can create problem if you pin it on a node and the VM memory is allocated on a another node.

You can see the VM NUMA memory information with the command xl debug-key u; xl dmesg.

With xl:
Pin a CPU:

xl vcpu-pin <Domain> <vcpu id> <cpu id>

e.g. : xl vcpu-pin 1 all 2-5 to pin all the vCPU of the VM 1 to core 2 to 5.

With CPUPool:

xl cpupool-numa-split # Will create a cpupool by NUMA node
xl cpupool-migrate <VM> <Pool>

(CPUPool only works for guest, not dom0)

And with xe:

xe vm-param-set uuid=<UUID> VCPUs-params:mask=<mask> #To add a pinning
xe vm-param-remove uuid=<UUID> param-name=VCPUs-params param-key=mask #To remove pinning

The mask above is CPU id separated with comma e.g. 0,1,2,3

Hope I could be useful, I will add that to the XCP-ng documentation soon