unreasonable netperf benchmark results

Question

I used netperf benchmark with the next commands:

server side: netserver -4 -v -d -N -p

client side: netperf -H -p -l 60 -T 1,1 -t TCP_RR

And I received the results:

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind

Local /Remote

Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec

16384 131072 1 1 60.00 9147.83
16384 131072

But when I changed the client to single CPU (same machine) by adding "maxcpus=1 nr_cpus=1" to kernel command line. And I ran the next command:

netperf -H -p -l 60 -t TCP_RR

I received the next results:

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind

Local /Remote

Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec

16384 131072 1 1 60.00 10183.33
16384 131072

Q: I don't understand how the performance has been improved when I decreased the CPUs number from 64 to 1 CPU?

Some technique information: I used Standard_L64s_v3 instance type of Azure; OS: sles:15:sp2

Just for clarifying: the results are repeated at least 10 times. All the times single CPU is better than multi CPUs system. — user1994587, Jul 19 '22 at 14:38
When you say you get the same result ten times in a row, is that on the same instantiation of either the single or 64 vCPU system? Also, how are you switching between 1 and 64 vCPUs? There can be any number of possibilities - with 64 vCPUs perhaps your workload is spanning NUMA nodes and it isn't with 1 vCPU. Perhaps you get a higher hardware CPU frequency with just the one vCPU than with 64. — Rick Jones, Jul 20 '22 at 21:25
I meant that with single CPU the results are about 10GB per sec, and with 64 CPUs the results are about 9GB per sec. for single CPU I added "maxcpus=1 nr_cpus=1" to kernel command line. — user1994587, Jul 21 '22 at 08:08
Hi @user1994587, if the provided answer resolved your issue, you may mark it as answer or upvote it so that others who encounter the similar issue, it may be useful for them or community members. — Kartik Bhiwapurkar, Aug 05 '22 at 04:06

score 1 · Answer 1 · answered Jul 21 '22 at 12:20

• The ‘netperf’ utility command executed by you on the client side is as follows and is the same after changing the number of CPUs on the client side but you can see an improvement in performance after decreasing the number of vCPUs on the client VM: -

netperf -H -p -l 60 -I 1,1 -t TCP_RR

The above command implies that you want to test the network connectivity performance between the host ‘Server’ and ‘Client’ for TCP Request/Response and get the results in a default directory path where pipes will be created for a period of 60 seconds.

• The CPU utilization measurement mechanism uses ‘proc/stat’ on Linux OS to record the time spent for such command executions. The code for this mechanism can be found in ‘src/netcpu_procstat.c’. Thus, you can check the configuration file accordingly.

Also, the CPU utilization mechanism in a virtual guest environment, i.e., a virtual machine may not reflect the actual utilization as in a bare metal environment because much of the networking processing happens outside the context of the virtual machine. Thus, as per the below documentation link by Hewlett-Packard: -

https://hewlettpackard.github.io/netperf/doc/netperf.html

If one is looking to measure the added overhead of a virtualization mechanism, rather than rely on CPU utilization, one can rely instead on netperf _RR tests - path-lengths and overheads can be a significant fraction of the latency, so increases in overhead should appear as decreases in transaction rate. Whatever you do, DO NOT rely on the throughput of a _STREAM test. Achieving link-rate can be done via a multitude of options that mask overhead rather than eliminate it.

As a result, I would suggest you rely on other monitoring tools available in Azure, i.e., Azure Monitor, Application insights, etc.

score 1 · Answer 2 · answered Jul 22 '22 at 15:20

Looking more closely at your netperf command line:

netperf -H -p -l 60 -T 1,1 -t TCP_RR

The -H option expects to take a hostname as an argument. And the -p option expects to take a port number as an argument. As written the "-p" will be interpreted as a hostname. And when I tried it at least will fail. I assume you've omitted some of the command line?

The -T option will bind where netperf and netserver will run (in this case on vCPU 1 on the netperf side and vCPU 1 on the netserver side) but it will not necessarily control where at least some of the network stack processing will take place. So, in your 64-vCPU setup, the interrupts for the networking traffic and perhaps the stack will run on a different vCPU. In your 1-vCPU setup, everything will be on the one vCPU. It is quite conceivable you are seeing the effects of cache-to-cache transfers in the 64-vCPU case leading to lower transaction/s rates.

Going to multi-processor will increase aggregate performance, but it will not necessarily increase single thread/stream performance. And single thread/stream performance can indeed degrade.

unreasonable netperf benchmark results

2 Answers2