We run an UDP based app server in which we are expecting high traffic. So I am in the process of tuning our NIC to achieve best performance with high throughput (compromising a little of latency).
For testing, I am using the 2 machines with the below configurations.
Machine Details
CPU : 40core
Threads per core: 2
Cores per socket : 2
NUMA node : 2
Model name : Intel(R) Xenon(R ) CPU E5-2630v4
CPU MHx: 2199.841
NUMA 0: 0-9, 20-29
NUMA 1: 10-19, 30-39
RAM: 128GB
NIC Details:
Driver: ixgbe
Version : 4.4.0-k-rh7.3
10G capable
While doing a simple load test I observed that the receiving rate (packets/sec) is not same as the sender.
(Observing the stats through SAR command) Sender: (sar -n UDP 1) odgm/s
1098825.00
1097993.00
1098103.00
Receiver: Idgm/s
622742.00
616321.00
649075.00
From the above, you can see the difference between the sender and receiver datagram packets.
Packet loss Observation:
Ethtool, netstat, sar - Observed stats in all these tools and no packet drop reported here.
Currently tuned:( with this tuning I was able to achieve max of 630k avg/ sec in receiver end)
IRQBALANCE : disabled
CPU affinity : manually distributed (each rx-tx queue / CPU)
Interrupt coalesce : rx-usecs 15
Flow control : on
Ring Buffer : rx 512
Rx Queue : 40
rmem_default = 16777216
rmem_max = 16777216
Rest all are default values Edit1: I changed to busy poll - 50 and was able to achieve better throughout but not consistent.
Why there is a difference in rate between the sender and receiver ? What all other NIC/OS params that can be tuned to achieve equal throughput rate as sender?
One strange thing that I observe using ethtool is this param “rx_no_dma_resources” keeps incrementing rapidly during the test.. Does it ring anything?
- Even though I disabled “irqbalance” in /proc/interrupts and /proc/net/softnet_stat there is no equal distribution of interrupts.
- Overall my ultimate goal is to achieve the best throughput with minimum packet loss.