I know how to do this:
commandGeneratingLotsOfSTDOUT | bzip2 -z -c > compressed.bz2
I also know how to do this:
commandGeneratingLotsOfSTDOUT | split -l 1000000
But I don't know how to do this:
commandGeneratingLotsOfSTDOUT | split -l 1000000 -compressCommand "bzip2 -z -c"
In case the above isn't already 100% clear, I am running a command that generates a terabyte or two of output. I want the output to be split into chunks of N lines (1 million in this case), and each chunk to be bzip2 compressed and stored in a file.
Right now what I do is this:
commandGeneratingLotsOfSTDOUT | split -l 1000000
foreach fileGenerated { bzip2 -z thatFile }
This adds an extra write to disk and read from disk (and write to disk again, albeit compressed) for every single file! Since the files are all bigger than RAM, this translates to actual disk usage.