Linux: Using bonding to connect two servers for faster file transfer

Question

My setup is two Dell R720 servers which are connected over 4 gigbit ports each to a Cisco WS-C2960S-24TS-L switch which in turn is connected to the Internet via 100MBit.

The servers are running Wheezy with an openvz (redhad) kernel: 2.6.32-openvz-042stab081.3-amd64

What I want is faster file-transfer between the two servers and some level of fault tolerance.

I managed to setup bonding and tried bonding modes balance-rr, 802.3ad and balance-alb. All worked in terms of me being able to connect to the servers. But I don't get any speedup in datatransfer between them.

(DELETED: I do understand that balance-rr only works with xover cabeling.)

Looking at the traffic count of ifconfig and the individual interfaces I see:

802.3ad: outgoing traffic using only the first interface. This is even true if transfering to another host with different mac-address.
balance-alb: outgoing traffic "somehow" unevenly distributed between interfaces but incoming traffic only on one interface

The Kernel docs tell me, that balance-rr mode needs: The balance-rr, balance-xor and broadcast modes generally require that the switch have the appropriate ports grouped together. The nomenclature for such a group differs between switches, it may be called an "etherchannel"

So the question is:

What is the right mode for me to use and how do I set it up so that it works?
If this is not possible generally it would help somehow having a setup which would use different interfaces for server/server and server/internet connections. But this has to use bonding and not different internal/external ip-adresses. (Because this in turn would make the openvz setup unnecessarily difficult)

Thanks in advance!

UPDATE: Having played with the switch I have set up two etherchannels for the two servers in "active" mode (is this correct?). But using 802.3ad as bonding method on the linux side I didn't see any changes in behaviour/speed.

UPDATE2: Sorry. Seems like now outgoing traffic uses different interfaces. Probably depending on the destinations mac address. Is this the best I can do?

UPDATE3: Only to show what I'm talking about:

root@warp2:/ssd/test# iperf -c 10.0.0.1
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 23.8 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 55759 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.16 GBytes  1.85 Gbits/sec

root@warp2:/ssd/test# iperf -c x.x.x.x
------------------------------------------------------------
Client connecting to warp1, TCP port 5001
TCP window size: 23.8 KByte (default)
------------------------------------------------------------
[  3] local 80.190.169.17 port 54861 connected with x.x.x.x port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes   944 Mbits/sec

First test is with 2 nics using balance-rr and currently in two vlans (one for every nic pair simulating x-link cables.)

Second test is with 2 nics using 802.3ad and EtherChannel.

`But I don't get any speedup in datatransfer between them.` - How are you testing? Hopefully you using a purely **network-based test like iperf**? Testing by doing a file-transfer the speed of your drive sub-systems into the equation. — Zoredache, Nov 15 '13 at 18:53
I'm using file transfer indeed. But I had some tests before using xover cables and `balance-rr` settings where I could see the speed nearly doubling. This isn't the case with this setup. So disk-io is not the problem. — Scheintod, Nov 15 '13 at 18:58
Do you actually have the switch configured here? You'd need to setup LAGs for 802.3ad to work properly. — devicenull, Nov 16 '13 at 02:08
Actually I'm doing this for the first time and have never before configured a switch nor do I have access to this particular switch in the moment. I try to get access. What do I have to configure? — Scheintod, Nov 16 '13 at 23:52
Cisco guide can help you ;) http://www.cisco.com/en/US/docs/switches/lan/catalyst2960/software/release/12.2_58_se/configuration/guide/swethchl.html - be sure to setup LACP mode, not PAgP. — Veniamin, Nov 18 '13 at 07:59
Hi Vaniami: Thanks. That is helpful for the start. Do you know which linux-bonding mode I want to use with LACP? Only 802.3ad? — Scheintod, Nov 18 '13 at 12:26
All: No I have configured the switch to use etherchannels. But this seems to change nothing?!? — Scheintod, Nov 18 '13 at 15:59
ewwhite: What do you mean? The fast interconnect? The stupid answer: because it's faster. The not so stupid answer: I'm making backups between the two servers for "warm failover", have one database on each in repliaction mode and generally move some stuff around. Using bonded X-Over cables this takes (roughly - I have tested this) halve the time. But in order to work this would require an internal set of ip addresses. This in turn would require to change my openvz setup to use switched networking. And this would complicate setup and maintanance. — Scheintod, Nov 18 '13 at 19:44
The correct mode is mode=4 (which is LACP). This requires switch support. — hookenz, Nov 19 '13 at 20:29

score 2 · Answer 1 · answered Nov 18 '13 at 16:45

2

Using link aggregation (etherchannel) will not speed up a single file transfer. Different connections can use different interfaces in the same etherchannel to increase maximum throughput for simultaneous transfers, but a single transfer will only ever be able to use a single interface, which is the behavior that you're describing.

answered Nov 18 '13 at 16:45

MDMarra

100,734
32
197
329

+1 That makes sense. *I need to reevaluate my life choices*. – ewwhite Nov 18 '13 at 17:05
1

So, how do I make different connections use different interfaces? What I know right now: Using 2 XLink cables and balance-rr I get roughly doubled speed. Using the switch I don't. The kernel-doc tell me: `"The balance-rr, balance-xor and broadcast modes generally require that the switch have the appropriate ports grouped together. The nomenclature for such a group differs between switches, it may be called an 'etherchannel'`. But the switch tells me "not connected" if I use balance-rr with ethercannels. Other docs tell me other things. I'm getting a little frustrated right now. – Scheintod Nov 18 '13 at 17:47
@Scheintod Honestly, I'm not very familiar with Linux bonding, so I don't think I can answer that. Perhaps you should open another question about that specifically? You might get better feedback. – MDMarra Nov 18 '13 at 18:19
Hi MDMarra. Thanks for your answer anyway. I'm no native speaker so perhaps this is the problem. So how would you suggest that I formulate my question? – Scheintod Nov 19 '13 at 09:36

Veniamin · Accepted Answer · 2013-11-19T20:12:49.540

1

I am afraid you can not utilize several links of your bonds in such simple setup during traffic exchange between this two servers. The reason: this Cisco switch performs load balancing based on IPs and MACs. That is even several file transfers will map to the same physical path.

It may use direct cross-cables. The setup should not be complicated as you are afraid of. I believe the switched (Veth) openvz setup is not needed here. VENET and simple static routes should be sufficient, I suppose.

The network setup may look as follow:

10.10.0.0/24 - subnet for direct interconnect.
10.20.1.0/24 - range for VEs on Server1
10.20.2.0/24 - range for VEs on Server2

Server1: 
   bond1: IP=10.10.0.1/24
   VE1:   IP=10.20.1.1/24
   VE2:   IP=10.20.1.2/24
   ...
   route  10.20.2.0/24 -> 10.10.0.2

Server2: 
   bond1: IP=10.10.0.2/24
   VE1:   IP=10.20.2.1/24
   VE2:   IP=10.20.2.2/24
   ...
   route  10.20.1.0/24 -> 10.10.0.1

And iptables, of course, to allow all this staff and not trying nat/masquerading pranks.

UPD:
With container migration it is better to use Veth setup … much better indeed ;)

edited Nov 19 '13 at 20:12

answered Nov 19 '13 at 19:25

Veniamin

863
6
11

hi Veniamin. Thanks for your answer. With veth the devil is in the detail. I tried this and finally failed because the linux not being able to distinct between virtual interfaces (venet0:0, venet0:1) in the routing. (I had another question on this here: http://serverfault.com/questions/548448/why-is-linux-choosing-the-wrong-source-ip-address) The proposed solution worked but would have involved "patching" every single vz. – Scheintod Nov 20 '13 at 11:49
Right now I've given up and using a compromise which is somewhat similar to what you suggest: I have one 802.3ad bond for external and inter-vz and one balance-rr bond for backup and moving vms. The balance-rr bond is only used from the host and so avoids problems with the routing. This is not ideal but I think I have to live with it until some genius kernel dev decides that switched fast interconnects would be a nice idea. I mark this as the correct solution because - as far as I know now - the "why it not works" is correct and the proposed solution is very similar to what I've come up with. – Scheintod Nov 20 '13 at 11:55
@Scheintod Concerning your related issue - just configure each your VEs with a single private address and make them globally available with static NAT rules. You can pre-configure iptables translation records for whole your address pool once and forget about It. – Veniamin Nov 20 '13 at 13:14
@Scheintod only one note to above: if you want VEs migration and failover to work properly in this case - NAT should be done on some external device. – Veniamin Nov 20 '13 at 13:25

score 1 · Answer 3 · answered Jul 31 '14 at 03:24

I've been experimenting with a similar setup and think I understand the solution. In my case I have two servers, each with dual gigabit NICs connected through 3com 3824 layer 2 switches.

After experimenting with various options I found that I needed to create a VLAN for each 1:1 NIC between servers (e.g. a VLAN which contained the switch ports for server1:eth0 and server2:eth0, another VLAN for server1:eth1 and server2:eth1). This required some extra configuration and routing, however it resulted in the nearly 2x throughput gain I expected.

So far I haven't fully digested the details; however since 802.11ad uses the same MAC for all NICs in the aggregated link I presume the switch would favor sending traffic for the destination mac through a single port rather than spreading the load across all ports in the aggregated link. By segregating each link into its own VLAN the switch will forward data in the same way it's sent--if the sender balances the transfer across two ports, the switch will forward it across two VLANs to two receiving ports.

Off the top of my head the only tradeoff is the extra configuration to allow the non-aggregated hosts on the network to access these servers. I still need to do redundancy testing--there's the chance that if eth0 on one server and eth1 on the other both fail that they'll lose communication. However I consider that unlikely to occur--it should route through the same path as other hosts and cross the VLANs (also, since I only have two NICs per server, losing any one NIC defeats the purpose of the setup and it become a moot point).

Take all of this with a grain of salt. I've tinkered with a lot of options and settings, and I believe your switch also is layer-3 aware, but I'm fairly confident pairing interfaces into segregated VLANs is the ticket to utilizing all available throughput.

I want to do this. Pls tell me what additional config and routing was needed on the servers — swami, Oct 19 '19 at 11:09

Linux: Using bonding to connect two servers for faster file transfer

3 Answers3