9

In my research group, we recently upgraded the OS on our machines from Red Hat 6.2 to Debian 8.3 and observed that the TCP round trip time through the integrated Intel 1G NICs between our machines had doubled from about 110µs to 220µs.

At first, I thought it was a configuration issue, so I copied all the sysctl configurations (such as tcp_low_latency=1) from the un-upgraded Red Hat machines to the Debian machines and that did not fix the issue. Next, I thought this may have been a Linux distribution issue and installed Red Hat 7.2 on the machines, but the round trip times remained around 220µs.

Finally, I figured that maybe the problem was with Linux kernel versions since Debian 8.3 and Red Hat 7.2 had both used kernel 3.x while Red Hat 6.2 used kernel 2.6. So to test this out, I installed Debian 6.0 with Linux kernel 2.6 and bingo! The times were fast again at 110µs.

Have others also experienced these higher latencies in the latest versions of Linux, and are there known workarounds?


Minimum Working Example

Below is a C++ application that can be used to benchmark the latency. It measures latency by sending a message, waiting for a response, and then sending the next message. It does this 100,000 times with 100-byte messages. Thus, we can divide the execution time of the client by 100,000 to get the round trip latencies. To use this first compile the program:

g++ -o socketpingpong -O3 -std=c++0x Server.cpp

Next run the server-sided version of the application on a host (say on 192.168.0.101). We specify the IP to ensure that we are hosting on a well-known interface.

socketpingpong 192.168.0.101

And then use the Unix utility time to measure the execution time of the client.

time socketpingpong 192.168.0.101 client

Running this experiment between two Debian 8.3 hosts with identical hardware gives the following results.

real  0m22.743s
user  0m0.124s
sys     0m1.992s

Debian 6.0 results are

real    0m11.448s 
user    0m0.716s  
sys     0m0.312s  

Code:

#include <unistd.h>
#include <limits.h>
#include <string.h>

#include <linux/futex.h>
#include <arpa/inet.h>

#include <algorithm>

using namespace std;

static const int PORT = 2444;
static const int COUNT = 100000;

// Message sizes are 100 bytes
static const int SEND_SIZE = 100;
static const int RESP_SIZE = 100;

void serverLoop(const char* srd_addr) {
    printf("Creating server via regular sockets\r\n");
    int sockfd, newsockfd;
    socklen_t clilen;
    char buffer[SEND_SIZE];
    char bufferOut[RESP_SIZE];
    struct sockaddr_in serv_addr, cli_addr;

    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0)
       perror("ERROR opening socket");

    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = inet_addr(srd_addr);
    serv_addr.sin_port = htons(PORT);

    fflush(stdout);
    if (bind(sockfd, (struct sockaddr *) &serv_addr,
             sizeof(serv_addr)) < 0) {
             perror("ERROR on binding");
    }

    listen(sockfd, INT_MAX);
    clilen = sizeof(cli_addr);
    printf("Started listening on %s port %d\r\n", srd_addr, PORT);
    fflush(stdout);

    while (true) {
        newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen);
        if (newsockfd < 0)
             perror("ERROR on accept");
        printf("New connection\r\n");

        int status = 1;
        while (status > 0) {
            // Read
            status = read(newsockfd, buffer, SEND_SIZE);
            if (status < 0) {
                perror("read");
                break;
            }

            if (status == 0) {
                printf("connection closed");
                break;
            }

            // Respond
            status = write(newsockfd, bufferOut, RESP_SIZE);
            if (status < 0) {
                perror("write");
                break;
            }
        }

        close(newsockfd);
    }


    close(sockfd);
}

int clientLoop(const char* srd_addr) {
    // This example is copied from http://www.binarytides.com/server-client-example-c-sockets-linux/
    int sock;
    struct sockaddr_in server;
    char message[SEND_SIZE] , server_reply[RESP_SIZE];

    //Create socket
    sock = socket(AF_INET , SOCK_STREAM , 0);
    if (sock == -1)
    {
        printf("Could not create socket");
    }
    puts("Socket created");

    server.sin_addr.s_addr = inet_addr(srd_addr);
    server.sin_family = AF_INET;
    server.sin_port = htons( PORT );

    //Connect to remote server
    if (connect(sock , (struct sockaddr *)&server , sizeof(server)) < 0)
    {
        perror("connect failed. Error");
        return 1;
    }

    printf("Connected to %s on port %d\n", srd_addr, PORT);

    // Fill buffer
    for (int i = 0; i < SEND_SIZE; ++i) {
        message[i] = 'a' + (i % 26);
    }

    for (int i = 0; i < COUNT; ++i) {
        if (send(sock, message, SEND_SIZE, 0) < 0) {
            perror("send");
            return 1;
        }

        if ( recv(sock, server_reply, RESP_SIZE, 0) < 0) {
            perror("recv");
            return 1;
        }
    }

    close(sock);

    printf("Sending %d messages of size %d bytes with response sizes of %d bytes\r\n",
            COUNT, SEND_SIZE, RESP_SIZE);
    return 0;
}

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("\r\nUsage: socketpingpong <ipaddress> [client]\r\n");
        exit(-1);
    }
    if (argc == 2)
        serverLoop(argv[1]);
    else
        clientLoop(argv[1]);
    return 0;
}
Stephen
  • 91
  • 1
  • 2
    What prompted the move from Redhat to _Debian_? On the Redhat side, there are more tools and utilities to help work through issues like this. – ewwhite Jul 23 '16 at 01:47
  • 1
    I would contact the Linux Kernel mailing list or (if you have it) Red Hat support. They might know, and if they don't there will be people who are all set up to "bisect" kernel code changes to find out where bugs come from. – Law29 Jul 23 '16 at 22:06
  • I think you should use some tool (gprof, Valgrind or gperftools) to profile your code. – Jose Raul Barreras Sep 05 '16 at 20:30
  • What happens if you disable nagle's algorithm on both client/server ? int ndelay = 1; setsockopt(, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(int)); - does the difference(s) persist ? Also - is this just for tcp ? i.e. for icmp/ping do you observe the same ? – Kjetil Joergensen Sep 09 '16 at 01:29
  • 1
    Also - is there any difference(s) in coalesce or offload settings between "fast" and "slow" ? ethtool -c and ethtool -k . Driver default(s) may have changed. – Kjetil Joergensen Sep 09 '16 at 01:37
  • What is the name of Intel 1G driver kernel module? – John Greene Sep 15 '16 at 17:35
  • And version of the Intel kernel drive module as reported by `dmesg` output. – John Greene Sep 16 '16 at 00:51
  • And if the MTU has been changed to anything other than 1500, note any change in `tcpdump -e -x -I eth0` like IP fragmentation, TCP SACK, or IP options. – John Greene Sep 16 '16 at 00:55

2 Answers2

1

This is not an answer but it is important to calibrate latency/throughput issues rigorously. It might help you get closer to the answer and even help others here give you better suggestions on the root-causing process.

Try getting more accurate data with a wireshark/tshark capture on the interface to,

  1. Confirm that the throughput is actually halved and
  2. Identify how the latency is distributed (between tx and rx)
    a. is it uniform over the test?
    b. is there a lumped stall somewhere?
nik
  • 7,100
  • 2
  • 25
  • 30
0

I looked through the changelogs, it could possibly be the introduction of QFQ

Kernel 3.0 Networking Changelog https://kernelnewbies.org/Linux_3.0#head-96d40fb6f9c48e789386dbe59fd5b5acc9a9059d

QFQ committer's page http://info.iet.unipi.it/~luigi/qfq/

It provides tight service guarantees at an extremely low per-packet cost.

m_krsic
  • 111
  • 5