3

I recently had a couple of 10Gbit Ethernet cards installed on a few machines connected to a LAN of about 80 commodity nodes running a distributed file system (Lustre). The 10Gbit card achieves good performance on file operations and is functioning as it should.

However, I wrote a custom client app in C that asynchronously sends large blocks of data to multiple nodes in the network. The machine the client app runs on has a 10GB Ethernet card, and all the destination nodes have a 1GB Ethernet card - so I should be able to get a theoretical max send transfer rate of 10 gbits.

If I run the client app on a machine with a 1Gbit card, it easily maxes out the card for sustained lengths of time. But strangely, if I run the same app on a machine with a 10Gbit card it gets horrible performance (around 20-30 mbits a second).

The program is written in C using normal TCP sockets. Is there some special setting required for 10Gbit? Because it's very odd that it gets maximum performance on a 1Gbit card, but horrible performance on a 10Gbit card. Again, the problem is not the 10Gbit card itself, since the distributed file system (Lustre) gets good performance with the 10Gbit card.

Any ideas/suggestions?

  • 1
    What transfer rate are you getting on 1 Gbit using custom app, and what transfer rate are you getting on 10 Gbit doing normal file operations? – Pyrolistical Jun 15 '09 at 17:43

3 Answers3

5

One thing I've noticed as a problem between 10gb and 1gb lan segments is that the default MTU is different. 10gb ethernet uses a default MTU of 9000 as opposed to the default of 1500 for 1gb ethernet. You can either change your MTU on the 10g to the lower number or set up your router to handle breaking down the jumbo packets for you.

This has caused me some headaches, because without one of those two things configured, there is a lot of packet fragmentation.

Keith
  • 352
  • 3
  • 11
  • 1
    If fragmenting is happening, performance invariably goes down the drain. The router (or switch) doing the fragmenting just doesn't have the CPU to do it fast. I think Keith got it bang on. – Alexandre Carmel-Veilleux Jun 24 '09 at 00:43
0

It is very likely that the application you wrote is not at par with the I/O optimizations of a standardized applications like Lustre.

The performance bottle necks in your code may not be surfacing on the machine and OS with a 1Gbps card but when the card throughput capacity increases to 10Gbps, with all other parameters constant (hardware and OS), your code limitations are made prominent.


This is quoted from the Wikipedia Lustre Implementation section.

In a typical Lustre installation on a Linux client, a Lustre filesystem driver module is loaded into the kernel and the filesystem is mounted like any other local or network filesystem. Client applications see a single, unified filesystem even though it may be composed of tens to thousands of individual servers and MDT/OST filesystems.

On some massively parallel processor (MPP) installations, computational processors can access a Lustre file system by redirecting their I/O requests to a dedicated I/O node configured as a Lustre client. This approach was used in the LLNL Blue Gene installation

Are you using this part?

Another approach uses the liblustre library to provide userspace applications with direct filesystem access.

Liblustre allows data movement directly between application space and the Lustre OSSs without requiring an intervening data copy through the kernel, thus providing low latency, high bandwidth access from computational processors to the Lustre file system directly.

nik
  • 7,100
  • 2
  • 25
  • 30
0

MTU will be important to get consistent on the interfaces.

Make sure you have hardware TCP offload turned on (if it works) . And that the firmware on all your nics is up to date as the TOE is very broken on some nics as they are shipped. I would test with TOE and TSO etc turned on and off and see if it makes any difference.

Are you using broadcomm 10G Ethernet as we have found real issues with those.

Have you tested your network with lnet tester ?

How many OSS do you have and what kind of throughput do you get though the filesystem ?

James
  • 2,232
  • 1
  • 13
  • 19