So I have this function in BASH that I'm trying to understand - and it uses parallelism:
function get_cache_files() {
## The maximum number of parallel processes. 16 since the cache
## naming scheme is hex based.
local max_parallel=${3-16}
## Get the cache files running grep in parallel for each top level
## cache dir.
find $2 -maxdepth 1 -type d | xargs -P $max_parallel -n 1 grep -Rl "KEY:.*$1" | sort -u
} # get_cache_files
So my questions:
- The comment: "16 since the cache naming scheme is hex based" - naming example is this:
php2-mindaugasb.c9.io/5c/c6/348e9a5b0e11fb6cd5948155c02cc65c
- why is it important to use 16 processes when the naming scheme is HEX based (hexadecimal system)? - The -P option for XARGS is for max-procs:
Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option with -P; otherwise chances are that only one exec will be done.
Ok, so: "xargs -P $max_parallel -n 1" is correct and 16 processes will be initiated? Or should n be equal to $max_parallel also?
As I understand the conditions to parallelise are:
- Independence of resources on which the operations will be performed (like similar files on which the operations will be performed);
- Operations are performed on independent computers;
What are other conditions, circumstances when you can parallelise?