1

I am trying to use a bash script to do several jobs in parallel. The jobs are memory intensive so I need to control the number that launch at a time. What I have is below, and it broadly works, but sometimes the delay loop is unaware of a job that has just been started, so several extra jobs get launched, causing the system to run out of memory.

Adding a sleep before the while statement in the delay loop reduces this problem, but does not completely eliminate it. Anyone know of a way to cure this. I'm running on Solaris if that's relevant.

#!/bin/bash
delay(){
while [ 8 -le $(ps -ef |grep  myjob |wc -l) ]
do
sleep 1
done
}

./myjob -params1 &
delay
./myjob -params2 &
delay
./myjob -params3 &
delay
./myjob -params4 &
delay
.
.
.
camelccc
  • 2,847
  • 8
  • 26
  • 52
  • possible duplicate of [Running a limited number of child processes in parallel in bash?](http://stackoverflow.com/questions/6593531/running-a-limited-number-of-child-processes-in-parallel-in-bash) – David Schwartz Aug 20 '12 at 10:38

4 Answers4

2

GNU parallel utility http://www.gnu.org/software/parallel/ might be the right tool as it can be said more easy to use than xargs

Stephane Rouberol
  • 4,286
  • 19
  • 18
0

Use xargs to do this. Pass it -n 1 to indicate one parameter per job and use the --max-jobs parameter to specify the number of concurrent processes.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
0

Formulate your script in terms of a makefile and let make -j N sort it out through Parallel Execution.

Volker Stolz
  • 7,274
  • 1
  • 32
  • 50
0

First, I'll give you a stripped down example of something I do in a couple of my linux scripts. This should work on solaris, but I don't have any systems currently to test on. I modified a couple of things that used /proc, so if anything doesn't work let me know.

#!/bin/bash

# set the max # of threads
max_threads=4
# set the max system load
max_load=4

print_jobs(){
# flush finished jobs messages
  jobs > /dev/null
  for x in $(jobs -p) ; do
   # print all jobs
    echo "$x"
  done
}

job_count(){
  cnt=$(print_jobs $1)
  if [ -n "$cnt" ]; then
    wc -l <<< "$cnt"
  else
    echo 0
  fi
}

cur_load(){
  # get the 1 minute load average integer
  uptime |sed 's/.*load average[s]*:[[:space:]]*\([^.]*\)\..*/\1/g'
}


main_function(){
 # get current job count and load
  jcnow=$(job_count)
  loadnow=$(cur_load)

 # first, enter a loop waiting for load/threads to be below thresholds
  while [ $loadnow -ge $max_load ] || [ $jcnow -ge $max_threads ]; do
    if ! [ $firstout ]; then
      echo "entering sleep loop. load: $loadnow, threads: $jcnow"
      st=$(date +%s)
      local firstout=true
    else
      now=$(date +%s)
     # if it's been 5 minutes, echo again:
      if [ $(($now - $st)) -ge 300 ]; then
        echo "still sleeping. load: $loadnow, threads: $jcnow"
        st=$(date +%s)
      fi
    fi
    sleep 5s

   # refresh these variables for loop
    loadnow=$(cur_load)
    jcnow=$(job_count)
  unset firstout
  done

  ( ./myjob $@ ) &
}

# do some actual work
for jobparams in "params1" "params2" "params3" "params4" "params5" "params6" "params7" ; do
   main_function $jobparams
done

wait

A couple of caveats:

  • you should trap signals so you can kill child processes. I do not know how to do this in solaris, but this works for on linux: trap 'echo "exiting" ; rm -f $lockfile ; kill 0 ; exit' INT TERM EXIT
  • if load climbs while jobs are already running there's no facility to throttle down

If you're not concerned about load at all, this can be a bit simpler:

#!/bin/bash

# set the max # of threads
max_threads=4

print_jobs(){
# flush finished jobs messages
  jobs > /dev/null
  for x in $(jobs -p) ; do
   # print all jobs
    echo "$x"
  done
}

job_count(){
  cnt=$(print_jobs $1)
  if [ -n "$cnt" ]; then
    wc -l <<< "$cnt"
  else
    echo 0
  fi
}

main_function(){
 # get current job count
  jcnow=$(job_count)

 # first, enter a loop waiting for threads to be below thresholds
  while [ $jcnow -ge $max_threads ]; do
    if ! [ $firstout ]; then
      echo "entering sleep loop. threads: $jcnow"
      st=$(date +%s)
      local firstout=true
    else
      now=$(date +%s)
     # if it's been 5 minutes, echo again:
      if [ $(($now - $st)) -ge 300 ]; then
        echo "still sleeping. threads: $jcnow"
        st=$(date +%s)
      fi
    fi
    sleep 5s

   # refresh these variables for loop
    jcnow=$(job_count)
  unset firstout
  done


  ( ./myjob $@ ) &
}

# do some actual work
for jobparams in "params1" "params2" "params3" "params4" "params5" "params6" "params7" ; do
   main_function $jobparams
done

wait
ben
  • 83
  • 5