LACP bond work intermittently on Cisco N5000

Question

Our clients are using Centos 7.1 and 7.3. They are two 10GB card bonded to make a 20GB link. Each of the network card is connected to a pair of Cisco N5000 switch. While running cp or iperf, I can see that sometimes the traffic only flows through one interface so we can only achieve 10GB. Other times it flows through both interface and we get 20GB.

It's not a faulty NIC or cable because I've tested a few servers and they all exhibit the same behaviour.

The iperf between two clients on the same switch / subnet.

# for i in {1..10}; do iperf -c 172.16.15.19 -l 1M -P 16 -t 300| grep SUM; done
[SUM]  0.0-300.0 sec   344 GBytes  9.84 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.84 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.84 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.84 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.83 Gbits/sec
[SUM]  0.0-300.0 sec   688 GBytes  19.7 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.85 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.84 Gbits/sec
[SUM]  0.0-300.0 sec   344 GBytes  9.83 Gbits/sec
[SUM]  0.0-300.0 sec   688 GBytes  19.7 Gbits/sec

Here's the bond settings

DEVICE=bond1
ONBOOT=yes
NETBOOT=yes
TYPE=Ethernet
HOSTNAME=removed
BOOTPROTO=static
IPADDR=172.16.15.18
NETMASK=255.255.255.0
GATEWAY=172.16.15.1
DNS1=10.1.1.71
DNS2=10.1.1.70
MTU=9000
BONDING_MASTER=yes
BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=layer3+4"

Here's the eth settings. Only posted one as they are identical apart from the eth name.

BOOTPROTO=none
TYPE=Ethernet
NAME=eth16
DEVICE=eth16
ONBOOT=yes
MTU=9000
MASTER=bond1
SLAVE=yes

Here are the settings for both Cisco N5000.

interface port-channel3008
  description <removed>
  switchport access vlan 3900
  vpc 3008

interface Ethernet103/1/31
  description <removed>
  switchport access vlan 3900
  channel-group 3008 mode active

I think you should add a load-balancing to Nexus configuration, in your case the most appropriate stanza should be `port-channel load-balance ethernet source-dest-port`.. — Peter Zhabin, Mar 24 '17 at 20:48

Miklos Niedermayer · Answer 1 · 2017-06-03T10:14:10.383

What you're really seeing is just a "quantization" effect of the port channel load balancing algorithm. A single flow will always go through the same physical interface, and in your case, you have two points in your test where a decision to go through interface 1 or 2 must be made, based on a configurable hashing algorithm. As Peter suggested, check the load balancing algorithm on the switch side as well: recent Nexuses default to layer3+4, older ones defaulted src-dst MAC (in which case you wouldn't see 20Gbps speeds between the same nodes). Indeed, things like this can be seen when you're testing with a limited number of flows.

LACP bond work intermittently on Cisco N5000

1 Answers1