3

I'm running a numerical model which parameters are in a "parameter.input" file. I use sbatch to submit multiple iterations of the model, with one parameter in the parameter file changing every time. Here is the loop I use:

#!/bin/bash -l
for a in {01..30}
do
  sed -i "s/control_[0-9][0-9]/control_${a}/g" parameter.input
  sbatch --time=21-00:00:00 run_model.sh
  sleep 60
done

The sed line changes a parameter in the parameter file. The run_model.sh file runs the model.

The problem: depending on the resources available, a job might run immediately or stay pending for a few hours. With my default loop, if 60 seconds is not enough time to find resources for job n to run, the parameter file will be modified while job n is pending, meaning job n will run with the wrong parameters. (and I can't wait for job n to complete before submitting job n+1 because each job takes several days to complete)

How can I force batch to wait to submit job n+1 until job n is running?

I am not sure how to create an until loop that would grab the status of job n and wait until it changes to 'running' before submitting job n+1. I have experimented with a few things, but the server I use also hosts another 150 people's jobs, and I'm afraid too much experimenting might create some issues...

user222552
  • 95
  • 1
  • 10
  • 1
    You already got an answer to your question, so you are ready to go, if you want. But the point with your question is that your method is a very bad and limiting practice that will lead to extra work when problems arise. There are several other ways to do your calculus without having to wait for the previous job to run. – Poshi May 10 '19 at 08:07
  • One way would be to parameterize the `run_model.sh` with the variable that you need, so you don't have to modify the `parameter.input` every time. – Poshi May 10 '19 at 08:10
  • Another way would be to run each execution in a different folder, so they all have its own `parameter.input`. This approach also have the advantage that the folders won't be overpopulated with files. – Poshi May 10 '19 at 08:11
  • 1
    A third approach (best one from my point of view) is to use a job array, and using the `SLURM_ARRAY_TASK_ID` environment variable to set up the desired parameter in each execution. – Poshi May 10 '19 at 08:13
  • All of them requires you to modify the `run_model.sh` script. If you can't do that for whatever reason, create a new script that encapsulates all the logic needed. That way you will be able to launch all the jobs at once, quickly and without interferences between them. – Poshi May 10 '19 at 08:14
  • Agreed! My solution is far from elegant. I had to add a few checks to make sure it works. If one waits until an output file is created by job n before submitting job n+1, things work pretty smoothy. In my case, modifying the model/adding the parameters of interest inside the model is inconvenient because said model takes a long time to compile. Creating directories for each run is also a bit of a burden because of the shear number of jobs I submit and the large size of the model executable. I will look into job arrays, it looks like it might be a good solution! – user222552 May 10 '19 at 17:32
  • If you expect a high number of folders, I expect the same number multiplied by some value of files. And so many files in a single folder will be difficult to manage and will slowdown the system. That's a good reason to use a folder per calculation. – Poshi May 10 '19 at 19:37

1 Answers1

3

Use the following to grab the last submitted job's ID and its status, and wait until it isn't pending anymore to start the next job:

sentence=$(sbatch --time=21-00:00:00 run_model.sh) # get the output from sbatch
stringarray=($sentence)                            # separate the output in words
jobid=(${stringarray[3]})                          # isolate the job ID
sentence="$(squeue -j $jobid)"            # read job's slurm status
stringarray=($sentence) 
jobstatus=(${stringarray[12]})            # isolate the status of job number jobid

Check that the job status is 'running' before submitting the next job with:

if [ "$jobstatus" = "R" ];then
  # insert here relevant code to run next job
fi

You can insert that last snippet in an until loop that checks the job's status every few seconds.

user222552
  • 95
  • 1
  • 10