0

I have a cluster with 32 machines. The first 25 machines are on the first rack and the rest 7 machines are on the second rack. Each rack has a 1Gbps Ethernet switch. The network communication between the different racks will certainly have a performance penalty (which I don't exactly know)

I used the network performance benchmark tool like 'iperf' to measure the network speed between the machines. There is no problem (all point-to-point connection between 32 machines can exploit the full bandwidth).

However, in my application (which is latency-sensitive with request/respond network communication architecture). The inter-rack network speed is 4~5 times slower than that of the intra-rack network speed.

Is there anything I can do here? Any well-known strategy to apply?

syko
  • 257
  • 2
  • 7

1 Answers1

2

Well, I think you've identified your problem: link contention between the two switches.

Look, each of your switches has a multi, multi gigabit backplane. Meaning that, depending on switch capabilities, the switch can sustain multiple full-duplex gigabit transfers concurrently. However, your link between switches is only one single gigabit, full duplex. So that link gets saturated and then things slow down.

To confirm this is what's happening, add monitoring to your switches and inspect the stats for your uplink ports during your speed testing.

Once you've confirmed, you have a couple of options. First, consider using an 802.3ad LAG uplink between switches. This will not allow any one flow to exceed 1Gbit, however you'll be able to support multiple concurrent 1Gbit streams, the number of which is dependent on how many LAG member ports you're using.

Another option is to upgrade to switches that can support 10Gb uplinks.

EEAA
  • 109,363
  • 18
  • 175
  • 245
  • Thanks, perfect answer.... You helped me figure out the cause and presented the possible solutions. Thanks – syko Jan 08 '17 at 05:37
  • I don't know how to measure the speed of the specific uplink port. So I tested by saturating the uplink traffic (The maximum bandwidth is the same with that of a single gigabit) – syko Jan 08 '17 at 05:39
  • I see there are many remaining ports in each switch. Can I just increase the between-switch bandwidth by adding a few more cables between them? (For example, plug in cat-6 lan cable between Switch-A and -B) What kind of manuals should I read for this? – syko Jan 08 '17 at 05:42
  • No, you cannot just add cables. You will need to first determine if your switches support LAG functionality, and then read the manual to see what configuration needs to be put in place. I should mention that this will only be possible with managed switches. – EEAA Jan 08 '17 at 13:25