Why can 'dd' read from a pipe faster than my own program using ifstream?

Question

I have two programs that pass data to each other via linux pipes (named or otherwise). I need to hit a transfer rate of ~2600 MB/s between the two programs, but am currently seeing a slower rate of about ~2200 MB/s. However, I found that if I replace my 2nd process with 'dd' instead, the transfer rate jumps to over 3000 MB/s. Is there something about the way my program is reading from the pipe that is less efficient than the way 'dd' does it? What can I do to improve this throughput? Is 'ifstream' inherently slower than other methods of reading binary data from pipe?

To summarize the two scenarios:

Scenario 1:

Program 1 -> [named pipe] -> Program 2

Yields ~2200 MB/s transfer rate

Scenario2:

Program 1 -> [named pipe] -> 'dd if=pipename of=/dev/null bs=8M'

Yields ~3000 MB/s transfer rate.

Here is the way my Program 2 currently reads from pipe:

ifstream inputFile;
inputFile.open(inputFileName.c_str(), ios::in | ios::binary);
while (keepLooping)
{
    inputFile.read(&buffer[0], 8*1024*1024);
    bytesRead = inputFile.gcount();
    //Do something with data
}

Update:

I have now tried using 'read(fd, &buffer[0], 8*1024*1024)' instead of istream, seemed to show a mild improvement (but not as much as dd)

I also tried using stream->rdbuf()->sgetn(&buffer[0], 8*1024*1024) instead of stream->read(), which did not help.

If it helps, here's `dd`'s source code: http://lingrok.org/xref/coreutils/src/dd.c — jason, Mar 27 '13 at 18:24
I believe `fstream`s have some overhead dealing with locales, but `dd` appears to be using `read` which has none of the associated locale-checking. Even in `ios::binary` mode, you still pay some of that penalty. How does your perf change if you use a `FILE*` instead? It's not as C++, but if perf is your concern... — Dan Lecocq, Mar 27 '13 at 18:27
Is it better to use FILE* or just a straight read using file descriptors? — KyleL, Mar 27 '13 at 18:28
What is 'do something with data'? Since dd simply outputs it, in your case to dev null. — Dave S, Mar 27 '13 at 18:37
I'm sure, that you tried all this stuff in `Release` mode, but if not - there could be worth results in `Debug` — borisbn, Mar 27 '13 at 18:38
Try using `fread/fwrite` from standard C library, `read/write` from POSIX and `sendfile` (Linux specific) and benchmark each method. — el.pescado - нет войне, Mar 27 '13 at 18:43
I tried using 'open()' and 'read()' with straight file descriptors like dd, but did not see any improvement. — KyleL, Mar 27 '13 at 19:02
The most obvious answer is that `//Do something with data` is actually a non-negligible expense and slowing down your program. What happens if you don't do the work? — Mark B, Mar 27 '13 at 20:17
IIRC, on linux/GCC `FILE*` is a wrapper around `std::streambuf` - not the other way around. — MSalters, Mar 28 '13 at 12:35

KyleL · Accepted Answer · 2013-03-28T00:13:30.447

The difference appears to be due to using an array instead of std::vector, which I still have a hard time believing. My two sets of code are shown below for comparison. The first can ingest from Program 1 at a rate of about 2500 MB/s. The second can ingest at a rate of 3100 MB/s.

Program 1 (2500 MB/s)

int main(int argc, char **argv)
{
    int fd = open("/tmp/fifo2", O_RDONLY);

    std::vector<char> buf(8*1024*1024);

    while(1)
    {
       read(fd, &buf[0], 8*1024*1024);
    }
}

Program 2 (3100 MB/s)

int main(int argc, char **argv)
{

    int fd = open("/tmp/fifo2", O_RDONLY);

    char buf[8*1024*1024];

    while(1)
    {
       read(fd, &buf[0], 8*1024*1024);
    }
}

Both are compiled with -O3 using gcc version 4.4.6. If anyone can explain the reason for this I'd be very interested (since I understand std::vector to basically be a wrapper around an array).

Edit: I just tested Program 3, below, that can uses ifstream and runs at 3000 MB/s. So it appears that using ifstream instead of 'read()' incurs a very slight performance degradation. Much less than the hit taken from using std::vector.

Program 3 (3000 MB/s)

int main(int argc, char **argv)
{
    ifstream file("/tmp/fifo2", ios::in | ios::binary);

    char buf[8*1024*1024];

    while(1)
    {
       file.read(&buf[0], 32*1024);
    }
}

Edit 2:

I modded Program 2's code to use malloc'd memory instead of memory on the stack and the performance dropped to match the vector performance. Thanks, ipc, for keying me onto this.

I would expect the raw interface version to run faster. Why does it surprise you? — Randy Howard, Mar 27 '13 at 21:06
The difference between vector and a plain static char array is, that the char array is on the stack while the vector allocates the data on the heap. I'm a little surprised that you don't get a stack overflow. — ipc, Mar 27 '13 at 22:08
@ipc Heap memory isn't slower, and since a vector stores its data contigously, shouldn't the difference only be an additional pointer indirection? — s3rius, Mar 27 '13 at 23:14
@ipc I ran a test and you're right about the heap. Changing Program 2 to use malloc'd memory cuts the performance to the same as program 1. — KyleL, Mar 28 '13 at 00:11

score 1 · Answer 2 · answered Mar 27 '13 at 18:53

1

This code compiled with g++ -Ofast:

int main(int argc, char *argv[])
{
  if (argc != 2) return -1;
  std::ifstream in(argv[1]);
  std::vector<char> buf(8*1024*1024);
  in.rdbuf()->pubsetbuf(&buf[0], buf.size());
  std::ios_base::sync_with_stdio(false);
  std::cout << in.rdbuf();
}

does not perform that bad at all.

$ time <<this program>> <<big input file>> >/dev/null
0.20s user 3.50s system 9% cpu 40.548 total
$ time dd if=<<big input file>> bs=8M > /dev/null
0.01s user 3.84s system 9% cpu 40.786 total

You have to consider that std::cout shares a buffer with stdout which is really time consuming if not switched off. So call std::ios_base::sync_with_stdio(false); if you want speed and do not intend to use C's input output methods (which are slower anyway).

Also, for raw and fast input/output in C++, use the methods from streambuf, obtained by rdbuf().

answered Mar 27 '13 at 18:53

ipc

8,045
29
33

If I'm using named pipes instead of unnamed pipes, then is that std::ios_base::sync_with_stdio(false) going to help anything? Also, what affect does pubsetbuf() have? Does that just increase the buffer size of the streambuf? – KyleL Mar 27 '13 at 19:08
1

`pubsetbuf()` sets the internal buffer size. If you call `read()` as you do in your code, this still uses the much smaller default buffer size. `std::ios_base::sync_with_stdio(false)` always helps if you are using `std::cin` or `std::cout`. – ipc Mar 27 '13 at 19:46
Whether the pipes are named or unnamed should not make a big difference. You can check that out yourself since I've shown you my code. – ipc Mar 27 '13 at 19:47

Why can 'dd' read from a pipe faster than my own program using ifstream?

2 Answers2