1

I am trying to connect a Linux server with two 1Gbps NICs with a switch Netgear ProSafe GSM7248V2 using bonding, specifically 802.3ad mode. The results are very confusing, I would be thankful for any hints on what to try next.

On the server side, this is my /etc/network/interfaces:

auto bond0
iface bond0 inet static
        address 192.168.1.15/24
        gateway 192.168.1.254
        dns-nameservers 8.8.8.8
        dns-search my-domain.org
        bond-slaves eno1 eno2
        bond-mode 4
        bond-miimon 100
        bond-lacp-rate 1
        bond-xmit_hash_policy layer3+4
        hwaddress aa:bb:cc:dd:ee:ff

The configuration of the switch is following:

(GSM7248V2) #show port-channel 3/2          


Local Interface................................ 3/2
Channel Name................................... fubarlg
Link State..................................... Up
Admin Mode..................................... Enabled
Type........................................... Dynamic
Load Balance Option............................ 6
(Src/Dest IP and TCP/UDP Port fields)

Mbr    Device/       Port      Port
Ports  Timeout       Speed     Active
------ ------------- --------- -------
0/7    actor/long    Auto      True   
       partner/long  
0/8    actor/long    Auto      True   
       partner/long  

(GSM7248V2) #show lacp actor 0/7    

         Sys    Admin   Port      Admin
 Intf  Priority  Key  Priority    State  
------ -------- ----- -------- ----------- 
0/7    1        55    128      ACT|AGG|LTO 

(GSM7248V2) #show lacp actor 0/8

         Sys    Admin   Port      Admin
 Intf  Priority  Key  Priority    State  
------ -------- ----- -------- ----------- 
0/8    1        55    128      ACT|AGG|LTO 

(GSM7248V2) #show lacp partner 0/7 

       Sys      System       Admin Prt Prt     Admin
 Intf  Pri       ID          Key   Pri Id      State
------ --- ----------------- ----- --- ----- ----------- 
0/7    0   00:00:00:00:00:00 0     0   0     ACT|AGG|LTO 

(GSM7248V2) #show lacp partner 0/8

       Sys      System       Admin Prt Prt     Admin
 Intf  Pri       ID          Key   Pri Id      State
------ --- ----------------- ----- --- ----- ----------- 
0/8    0   00:00:00:00:00:00 0     0   0     ACT|AGG|LTO 

I believe that xmit "layer3+4" is most compatible with the Load Balance Type 6 of the switch. The first surprising thing is that the switch does not see the MAC address of the LACP partner.

On the server side, this is the content of /proc/net/bonding/bond0:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: ac:1f:6b:dc:2e:88
Active Aggregator Info:
    Aggregator ID: 15
    Number of ports: 2
    Actor Key: 9
    Partner Key: 55
    Partner Mac Address: a0:21:b7:9d:83:6a

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:dc:2e:88
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: ac:1f:6b:dc:2e:88
    port key: 9
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 1
    system mac address: a0:21:b7:9d:83:6a
    oper key: 55
    port priority: 128
    port number: 8
    port state: 61

Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:dc:2e:89
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: ac:1f:6b:dc:2e:88
    port key: 9
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 1
    system mac address: a0:21:b7:9d:83:6a
    oper key: 55
    port priority: 128
    port number: 7
    port state: 61

If I understand this correctly, it means that the Linux bonding driver correctly determined all the aggregator details (key, port numbers, system priority, port priority, etc). Despite that, I receive this in dmesg after a restart of the networking service:

[Dec14 20:40] bond0: Releasing backup interface eno1
[  +0.000004] bond0: first active interface up!
[  +0.090621] bond0: Removing an active aggregator
[  +0.000004] bond0: Releasing backup interface eno2
[  +0.118446] bond0: Enslaving eno1 as a backup interface with a down link
[  +0.027888] bond0: Enslaving eno2 as a backup interface with a down link
[  +0.008805] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[  +3.546823] igb 0000:04:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  +0.160003] igb 0000:05:00.0 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  +0.035608] bond0: link status definitely up for interface eno1, 1000 Mbps full duplex
[  +0.000004] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[  +0.000008] bond0: first active interface up!
[  +0.000166] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[  +0.103821] bond0: link status definitely up for interface eno2, 1000 Mbps full duplex

Both interfaces are alive, the network connection seems to be quite normal, I just receive that strange warning that there is no 802.3ad compatible partner.

In addition, when I try to simultaneously copy two large binary files (10GB each) from two different machines connected to the very same switch, each connected with 1Gbps, the overall throughput of the bond0 interface on the server is well below 1Gbps, although I would expect something closer to 2 Gbps (the read speed, etc is not a limiting factor here, all SSDs, well cached, etc). When I copy the same files sequentially, one after another, from the same machines, I easily reach throughputs close to 1Gbps.

Do you, please, have any idea, what could be wrong here? Regarding the diagnostics, the confusing warning appears in dmesg (no 802.3ad compatible partner) and in the sh lacp output of the switch (no MAC of the partner, although regular port record shows the correct MAC address of the connected NIC). Regarding the network performance, I cannot really see any aggregation using two different connections. I would be very thankful for any hint.

michalt
  • 11
  • 3

1 Answers1

0

The switch is configured to long LACP timeout - one LACPDU every 30 seconds.

The Linux system is configured to bond-lacp-rate 1.

I can't find what this actually does in Debian, but if it passes the lacp_rate=1 module option to bonding (reference), then that is the fast timeout - one LACPDU every 1 second.

This mismatch between slow/fast LACP rate is a misconfiguration.

All the example documentation I can find says that Debian accepts bond-lacp-rate slow which will hopefully correct it for you.

You could probably also remove the bond-lacp-rate line from your config file, as the default is slow rate, then unload the bonding module or reboot to apply.

Don't test throughput with just two streams. The layer3+4 policy does not guarantee that any two streams each get a separate NIC, just that given enough streams, traffic should balance somewhat evenly.

Test with say 16 or 32 concurrent iperf3 TCP streams. The total throughput of all streams should be close to 2Gbps.

suprjami
  • 3,536
  • 21
  • 29