3

In a previous question I asked how to queue a job B to start after job A, which is done with

sbatch --dependency=after:123456:+5 jobB.slurm

where 123456 is the id for job A, and :+5 denotes that it will start five minutes after job A. I now need to do this for several jobs. Job B should depend on job A, job C on B, job D on C.

sbatch jobA.slurm will return Submitted batch job 123456, and I will need to pass the job id to the call with dependency for all but the first job. As I am using a busy cluster, I can't rely on incrementing the job ids by one, as someone might queue a job between.

As such I want to write a script that takes the job scripts (*.slurm) I want to run as arguments, e.g.

./run_jobs.sh jobA.slurm jobB.slurm jobC.slurm jobD.slurm

The script should then run, for all jobs scripts passed to it,

sbatch jobA.slurm # Submitted batch job 123456
sbatch --dependency=after:123456:+5 jobB.slurm # Submitted batch job 123457
sbatch --dependency=after:123457:+5 jobC.slurm # Submitted batch job 123458
sbatch --dependency=after:123458:+5 jobD.slurm # Submitted batch job 123459

What is an optimal way to do this with bash?

oguz ismail
  • 1
  • 16
  • 47
  • 69
mhovd
  • 3,724
  • 2
  • 21
  • 47
  • What is the nature of the dependency? And how similar are the job scripts? Just because, depending on the answers to those questions, the goal might be more easily achieved by simply putting `sbatch ` at the end of each script or similar. – Biggsy Jan 12 '21 at 13:48
  • That would be a solution, but I need them to be spaced apart in time by ~ 5 minutes, which would not be the case if I were to invoke `jobB` from `jobA`. To answer your question, I can not have two jobs starting at the same time, as that will lead them to fail (with no option to ameliorate that condition). – mhovd Jan 12 '21 at 18:49
  • Would inserting a `sleep 300` in the submission script at the beginning or the end would be practical? If so, you can do as suggested, or submit them all with the same job name and the `--dependency=singleton` option – damienfrancois Jan 13 '21 at 12:25
  • `dependency=singleton` would not be feasible, as I want the jobs to run in parallell, as some may take several days to complete. – mhovd Jan 13 '21 at 14:08
  • As for `sleep` for delayed submission, the jobs are placed into queue, and will sometimes start at the same time even though they were submitted apart from each other in time. – mhovd Jan 13 '21 at 14:20
  • OK, so job B needs to start >= 5 mins after job A starts or else they fail. Why does this cause them to fail? Is it something to do with restrictions of the workload manager? Or something to do with concurrent read of the input data? Or something else? Just because, understanding that will help to come up with a suitable solution. – Biggsy Jan 13 '21 at 16:11
  • The exact reason is difficult to explain, but I tried to find a solution that would allow two jobs to start at the same time without luck. Five minutes is probably excessive, two would be just fine. – mhovd Jan 13 '21 at 18:18

1 Answers1

8

You can use the --parsable option to get the jobid of the previously submitted job:

#!/bin/bash

ID=$(sbatch --parsable $1)
shift 
for script in "$@"; do
  ID=$(sbatch --parsable --dependency=after:${ID}:+5 $script)
done
Marcus Boden
  • 1,495
  • 8
  • 11