You've already got a good answer, but I wasn't satisfied with my guess, so I decided to test my assumptions.
I made a simple C++ program called streamstream
that just takes STDIN and writes it to STDOUT in 1024-byte chunks. It looks like this:
#include <stdio.h>
int main()
{
const int BUF_SIZE = 1024;
unsigned char* buf = new unsigned char[BUF_SIZE];
size_t read = fread(buf, 1, BUF_SIZE, stdin);
while(read > 0)
{
fwrite(buf, 1, read, stdout);
read = fread(buf, 1, BUF_SIZE, stdin);
}
delete buf;
}
To test how the program uses memory, I ran it with valgrind
while piping the output from one to another as follows:
cat onetwoeightk | valgrind --tool=massif ./streamstream | valgrind --tool=massif ./streamstream | valgrind --tool=massif ./streamstream | hexdump
...where onetwoeightk
is just a 128KB file of random bytes. Then I used the ms_print
tool on the massif output to aid in interpretation. Obviously there is the overhead of the program itself and its heap, but it starts at about 80KB and never grows beyond that, because it's sipping STDIN just one kilobyte at a time.
The data is passed from process to process 1 kilobyte at a time. Our overall memory usage will peak at 1 kilobyte * the number of instances of the program handling the stream.
Now let's do what your perl program is doing--I'll read the whole stream (growing my buffer each time) and then write it all to STDOUT. Then I'll check the valgrind output again.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
const int BUF_INCREMENT = 1024;
unsigned char* inbuf = (unsigned char*)malloc(BUF_INCREMENT);
unsigned char* buf = NULL;
unsigned int bufsize = 0;
size_t read = fread(inbuf, 1, BUF_INCREMENT, stdin);
while(read > 0)
{
bufsize += read;
buf = (unsigned char *)realloc(buf, bufsize);
memcpy(buf + bufsize - read, inbuf, read);
read = fread(inbuf, 1, BUF_INCREMENT, stdin);
}
fwrite(buf, 1, bufsize, stdout);
free(inbuf);
free(buf);
}
Unsurprisingly, memory usage climbs to over 128 kilobytes over the execution of the program.
KB
137.0^ :#
| ::#
| ::::#
| :@:::#
| :::@:::#
| :::::@:::#
| :@:::::@:::#
| :@:@:::::@:::#
| ::@:@:::::@:::#
| :@::@:@:::::@:::#
| :@:@::@:@:::::@:::#
| @::@:@::@:@:::::@:::#
| ::@::@:@::@:@:::::@:::#
| :@::@::@:@::@:@:::::@:::#
| @:@::@::@:@::@:@:::::@:::#
| ::@:@::@::@:@::@:@:::::@:::#
| :@::@:@::@::@:@::@:@:::::@:::#
| @::@::@:@::@::@:@::@:@:::::@:::#
| ::@::@::@:@::@::@:@::@:@:::::@:::#
| ::::@::@::@:@::@::@:@::@:@:::::@:::#
0 +----------------------------------------------------------------------->ki
0 210.9
But the question is, what is the total memory usage due to this approach? I can't find a good tool for measuring the memory footprint over time of a set of interacting processes. ps
doesn't seem accurate enough here, even when I insert a bunch of sleeps. But we can work it out: the 128KB buffer is only freed at the end of program execution, after the stream is written. But while the stream is being written, another instance of the program builds its own 128KB buffer. So we know our memory usage will climb to 2x 128KB. But it won't rise to 3x or 4x 128KB by chaining more instances of our program, as our instances free their memory and close as soon as they are done writing to STDOUT.