I have a program that generates lots (terabytes) of output and sends it to stdout.
I want to split that output and process it in parallel with a bunch of instances of another program. It can be distributed in any way, as long as the lines are left intact.
Parallel can do this, but it takes a fixed number of lines and restartes the filter process after this:
./relgen | parallel -l 100000 -j 32 --spreadstdin ./filter
Is there a way to keep a constant number of processes running and distribute data among them?