This is pretty straight forward:
Say I have many files in the folder data/
to process via some executable ./proc
. What is the simplest way to maximize efficiency? I have been doing this to gain some efficiency:
ls --sort=size data/* | tac | parallel ./proc
which lists the data according to size, then tac
(reverse of cat
) flips the order of that output so the smallest files are processed first. Is this the most efficient solution? If not, how can the efficiency be improved (simple solutions preferred)?
I remember that sorting like this leads to better efficiency since larger jobs don't block up the pipeline, but aside from examples I can't find or remember any theory behind this, so any references would be greatly appreciated!