Epyc VM to VM networking slow
-
I'd like to check something to see if it's coherent with our tests, by using 2x similar VMs (4vCPUs/4G RAM):
- iperf monothread speed on a "fresh" Debian 10 install (4.19 kernel)
- the same bench with 5.10.0 kernel from backports (add
deb http://deb.debian.org/debian buster-backports main contrib non-free
in your source list and then apt install linux-image-5.10, don't forget to reboot to be on that kernel)
Do you see a performance diff between those?
-
@olivierlambert said in Epyc VM to VM networking slow:
I'd like to check something to see if it's coherent with our tests, by using 2x similar VMs (4vCPUs/4G RAM):
- iperf monothread speed on a "fresh" Debian 10 install (4.19 kernel)
- the same bench with 5.10.0 kernel from backports (add
deb http://deb.debian.org/debian buster-backports main contrib non-free
in your source list and then apt install linux-image-5.10, don't forget to reboot to be on that kernel)
Do you see a performance diff between those?
FYI, getting a Debian 10 backports or non-backports packages are going to now be extremely difficult. The Debian Linux 10 LTS has reached EOL. Now currently in ELTS from the beginning of this month until 30/06/2029, though covering only a subset of the packages.
-
I had no issue to test it quickly. The thing is for the sake of testing and try to identify a potential regression, not for production usage or whatnot.
-
I identified a specific regression in a Debian kernel build since 5.10, we are investigating the "why" (starting from this exact build: https://snapshot.debian.org/package/linux/5.10.92-1/)
-
@olivierlambert
Would it be possible for you to either offer a ISO to download? Or maybe seed one? I really want to help test this. But I'm getting lost with how Debian provides their legacy images and this jig-boo (intentionally misspelled) -
May someone could graph their vm.
Comparing a slow vm with a full speed could bring light into darknes.https://www.brendangregg.com/Articles/Linux_Kernel_Performance_Flame_Graphs.pdf
-
@probain Debian 10 is available in the XOA Hub.
-
@olivierlambert
I wasn't aware. Thanks! Downloading for doing a test, right awayTest done:
Run1 Run2 Run3 Sender: Debian10 kernel 4.19 4.81Gb 4.81Gb 4.83Gb Reveiver: Debian10 kernel 4.19 Sender: Debian10 kernel 5.10 5.13Gb 5.02Gb 5.12Gb Reveiver: Debian10 kernel 4.19 Sender: Debian10 kernel 5.10 4.98Gb 5.02Gb 4.97Gb Reveiver: Debian10 kernel 5.10
sender runs 'iperf -c <IP-to-receiver> -t 60'
Kernel 4.19 = 4.19.0-6-amd64
Kernel 5.10 = 5.10.0-0.deb10.24-amd64CPU 4 cores (AMD EPYC 7302P)
RAM 4GBCreated from XOA-hub
-
Thanks @probain , now can you try
iperf -s
in the Dom0 andiperf -c <IP dom0>
in the Debian guest? -
@olivierlambert
vm -> dom0 results in "no route to host": firewall?Results will be shown for dom0 -> vm. Listed by each kernel installed on vm.
Just as earlier. VM is installed via XOA Hub, with 4 CPU and 4GB RAM. Host CPU running on AMD EPYC 7302P.
VM kernel ver. Run1 Run2 Run3 kernel 4.19.0 8.47Gb 8.82Gb 8.43Gb kernel 5.10.0 7.12Gb 7.07Gb 7.11Gb
-
yes disable the fw first (only in a testing lab obviously) with
iptables -F
-
@olivierlambert how do I restore the iptables again afterwards? Other than reboot ofc
Update: Tests done
vm -> dom0 Run1 Run2 Run3 kernel 4.19.0 5.84Gb 5.77Gb 5.85Gb kernel 5.10.0 1.25Gb 1.26Gb 1.28
Specs are just as previous post.
-
Thanks so at least it confirms something we are also spotting in here. We found the exact commit.
-
Here are the opterons with dropped firewall:
source destination OS Kernel Speed Average vm dom debian 10 4.19.0-6-amd64 6.57 Gbits/sec dom vm debian 10 4.19.0-6-amd64 1.79 Gbits/sec vm dom truenas 6.6.20 2.01 Gbits/sec dom vm truenas 6.6.20 1.82 Gbits/sec host vm debian 10 4.19.0-6-amd64 5.32 Gbits/sec host vm truenas 6.6.20 1.92 Gbits/sec host dom debian 4.19.0+1 8.97 Gbits/sec -
@probain said in Epyc VM to VM networking slow:
I restore the iptables again afterwards? Other than reboot
this worked for me
action command save iptables-save > firewall.conf flush iptables -F restore cat firewall.conf | iptables-restore -
-
Here's a little test I just ran between VM's over SMB on my Threadripper 7960x build on a Supermicro H13SRA-TF motherboard, def not too bad, these VM's are on different SR's.
-
@sluflyer06 This test does not say anything other than that you have a 10G nic and we already knew that the limit for latest gen amd's are just above 10G. If you insert an 25 G nic then you can only use half of that capacity likely and for some of us that are using this in actual datacenters that is a pretty critical issue.even more so when it seems the limit is shared per host so that 4 VMs running on same host if the limit is 12gbit means you get 3 gbit per vm. And when you realize lots of us may have 20-40 VMs per server that all use a decent portion of network it is suddenly really scary whenn you realize that is 300-600 mbit per server.
Or even worse when you realize that for those that have earlier gens of amd platform where the limit is 2-4 gbit ish.. now you re looking at 100-200 mbit per vm which suddenly is not very unobtainable for even a smaller provider during peak use times.
It is great that the issue is not triggered for you as your bottleneck is elsewhere, but it is a very serious issue for several of us.
With that said, Vates is handling it as good as anyone could request and i thank them for the attention given and the dedication to solving it.
It is a NASTY bug and very situational for it to have been discovered.
-
@Seneram ah well excuse my ignorance then, I thought people said the limits were much lower. I can see what you are saying and the big issue with that.
-
@olivierlambert is it already known in which update/release this problem will be solved?
-
@LennertvdBerg they are still trying to figure this one out.
And an estimated full fix is not in sight just yet from what i know. Atleast i havent been informed in my ticket with them about this. But i do know they are still working very hard on this.