0

I have a command:

python test.py -i input_file -p parameter_1 -o output_file

I want to run this for multiple input files, 4 input files at a time (in parallel) and once they finish run the next 4 and so on using a bash script.

I am not able to find the right answer.

I am trying the following but I am not sure if it is right

for (( n=0; n<12; n++ ))
do
    ((j=j%4)); ((j++==0)) && wait
    python test.py -i input_list[n] -p parameter_1 -o output_$n &
done

Thanks in advance!

4 Answers4

2

In a bash script, you can try something like:

processes_max=4
counter=0

for f in input_files/*.txt
do
    python test.py -i $f -p parameter_1 -o output_files/$(basename $f) &  
    counter=$((counter+1))
    # if the counter equal to processes_max: wait for all processes to finish
    if [ $counter -eq $processes_max ]; then
        wait
        counter=0
    fi
done

wait
vmicrobio
  • 331
  • 1
  • 2
  • 13
1

Wait needs to be placed after job creation.

for (( n = 0; n < 12; ++n )); do
    python test.py -i 'input_list[n]' -p parameter_1 -o "output_$n" &
    (( j = (j + 1) % 4 )) || wait
done

Optionally add another wait after the loop to wait for lesser than 4 processes.

konsolebox
  • 72,135
  • 12
  • 99
  • 105
0

If you have GNU Parallel:

parallel -j4 python test.py -i {} -p parameter_1 -o {}.output ::: inputfiles*

If your system has 4 CPU threads, you can remove -j4.

It will start 4 jobs and when one job finishes, it will start the next.

This gives a better utilization of the CPU than if you wait for all 4 jobs to finish - especially if one of the jobs takes longer than the others.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
0

If you simply wanted to have at most 4 running jobs at a time GNU parallel would be your man:

printf '%s\n' "${input_list[@]}" |
  parallel -j4 'python test.py -i {} -p parameter_1 -o output_$(({#}-1))'

But as you want to wait until the four finish before you start the next batch (less efficient) you could group your 4 jobs in a single bash function and then use parallel with -j1 -N4:

foo() {
  ((n=4*("$1"-1))); shift
  for ((i=0;i<4;i++)); do
    python test.py -i "$1" -p parameter_1 -o output_"$((n+i))" & shift
  done
  wait
}
export -f foo
printf '%s\n' "${input_list[@]}" | parallel -j1 -N4 foo {#} {1} {2} {3} {4}
Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51