8

I have two programs that pass data to each other via linux pipes (named or otherwise). I need to hit a transfer rate of ~2600 MB/s between the two programs, but am currently seeing a slower rate of about ~2200 MB/s. However, I found that if I replace my 2nd process with 'dd' instead, the transfer rate jumps to over 3000 MB/s. Is there something about the way my program is reading from the pipe that is less efficient than the way 'dd' does it? What can I do to improve this throughput? Is 'ifstream' inherently slower than other methods of reading binary data from pipe?

To summarize the two scenarios:

Scenario 1:

Program 1 -> [named pipe] -> Program 2

Yields ~2200 MB/s transfer rate

Scenario2:

Program 1 -> [named pipe] -> 'dd if=pipename of=/dev/null bs=8M'

Yields ~3000 MB/s transfer rate.

Here is the way my Program 2 currently reads from pipe:

ifstream inputFile;
inputFile.open(inputFileName.c_str(), ios::in | ios::binary);
while (keepLooping)
{
    inputFile.read(&buffer[0], 8*1024*1024);
    bytesRead = inputFile.gcount();
    //Do something with data
}

Update:

I have now tried using 'read(fd, &buffer[0], 8*1024*1024)' instead of istream, seemed to show a mild improvement (but not as much as dd)

I also tried using stream->rdbuf()->sgetn(&buffer[0], 8*1024*1024) instead of stream->read(), which did not help.

KyleL
  • 1,379
  • 2
  • 13
  • 35
  • 2
    If it helps, here's `dd`'s source code: http://lingrok.org/xref/coreutils/src/dd.c – jason Mar 27 '13 at 18:24
  • 1
    I believe `fstream`s have some overhead dealing with locales, but `dd` appears to be using `read` which has none of the associated locale-checking. Even in `ios::binary` mode, you still pay some of that penalty. How does your perf change if you use a `FILE*` instead? It's not as C++, but if perf is your concern... – Dan Lecocq Mar 27 '13 at 18:27
  • Is it better to use FILE* or just a straight read using file descriptors? – KyleL Mar 27 '13 at 18:28
  • What is 'do something with data'? Since dd simply outputs it, in your case to dev null. – Dave S Mar 27 '13 at 18:37
  • I'm sure, that you tried all this stuff in `Release` mode, but if not - there could be worth results in `Debug` – borisbn Mar 27 '13 at 18:38
  • Try using `fread/fwrite` from standard C library, `read/write` from POSIX and `sendfile` (Linux specific) and benchmark each method. – el.pescado - нет войне Mar 27 '13 at 18:43
  • I tried using 'open()' and 'read()' with straight file descriptors like dd, but did not see any improvement. – KyleL Mar 27 '13 at 19:02
  • The most obvious answer is that `//Do something with data` is actually a non-negligible expense and slowing down your program. What happens if you don't do the work? – Mark B Mar 27 '13 at 20:17
  • IIRC, on linux/GCC `FILE*` is a wrapper around `std::streambuf` - not the other way around. – MSalters Mar 28 '13 at 12:35

2 Answers2

3

The difference appears to be due to using an array instead of std::vector, which I still have a hard time believing. My two sets of code are shown below for comparison. The first can ingest from Program 1 at a rate of about 2500 MB/s. The second can ingest at a rate of 3100 MB/s.

Program 1 (2500 MB/s)

int main(int argc, char **argv)
{
    int fd = open("/tmp/fifo2", O_RDONLY);

    std::vector<char> buf(8*1024*1024);

    while(1)
    {
       read(fd, &buf[0], 8*1024*1024);
    }
}

Program 2 (3100 MB/s)

int main(int argc, char **argv)
{

    int fd = open("/tmp/fifo2", O_RDONLY);

    char buf[8*1024*1024];

    while(1)
    {
       read(fd, &buf[0], 8*1024*1024);
    }
}

Both are compiled with -O3 using gcc version 4.4.6. If anyone can explain the reason for this I'd be very interested (since I understand std::vector to basically be a wrapper around an array).

Edit: I just tested Program 3, below, that can uses ifstream and runs at 3000 MB/s. So it appears that using ifstream instead of 'read()' incurs a very slight performance degradation. Much less than the hit taken from using std::vector.

Program 3 (3000 MB/s)

int main(int argc, char **argv)
{
    ifstream file("/tmp/fifo2", ios::in | ios::binary);

    char buf[8*1024*1024];

    while(1)
    {
       file.read(&buf[0], 32*1024);
    }
}

Edit 2:

I modded Program 2's code to use malloc'd memory instead of memory on the stack and the performance dropped to match the vector performance. Thanks, ipc, for keying me onto this.

KyleL
  • 1,379
  • 2
  • 13
  • 35
  • I would expect the raw interface version to run faster. Why does it surprise you? – Randy Howard Mar 27 '13 at 21:06
  • 1
    The difference between vector and a plain static char array is, that the char array is on the stack while the vector allocates the data on the heap. I'm a little surprised that you don't get a stack overflow. – ipc Mar 27 '13 at 22:08
  • 1
    @ipc Heap memory isn't slower, and since a vector stores its data contigously, shouldn't the difference only be an additional pointer indirection? – s3rius Mar 27 '13 at 23:14
  • @ipc I ran a test and you're right about the heap. Changing Program 2 to use malloc'd memory cuts the performance to the same as program 1. – KyleL Mar 28 '13 at 00:11
1

This code compiled with g++ -Ofast:

int main(int argc, char *argv[])
{
  if (argc != 2) return -1;
  std::ifstream in(argv[1]);
  std::vector<char> buf(8*1024*1024);
  in.rdbuf()->pubsetbuf(&buf[0], buf.size());
  std::ios_base::sync_with_stdio(false);
  std::cout << in.rdbuf();
}

does not perform that bad at all.

$ time <<this program>> <<big input file>> >/dev/null
0.20s user 3.50s system 9% cpu 40.548 total
$ time dd if=<<big input file>> bs=8M > /dev/null
0.01s user 3.84s system 9% cpu 40.786 total

You have to consider that std::cout shares a buffer with stdout which is really time consuming if not switched off. So call std::ios_base::sync_with_stdio(false); if you want speed and do not intend to use C's input output methods (which are slower anyway).

Also, for raw and fast input/output in C++, use the methods from streambuf, obtained by rdbuf().

ipc
  • 8,045
  • 29
  • 33
  • If I'm using named pipes instead of unnamed pipes, then is that std::ios_base::sync_with_stdio(false) going to help anything? Also, what affect does pubsetbuf() have? Does that just increase the buffer size of the streambuf? – KyleL Mar 27 '13 at 19:08
  • 1
    `pubsetbuf()` sets the internal buffer size. If you call `read()` as you do in your code, this still uses the much smaller default buffer size. `std::ios_base::sync_with_stdio(false)` always helps if you are using `std::cin` or `std::cout`. – ipc Mar 27 '13 at 19:46
  • Whether the pipes are named or unnamed should not make a big difference. You can check that out yourself since I've shown you my code. – ipc Mar 27 '13 at 19:47