0

I have a sbatch script to submit job arrays to Slurm with different steps:

#!/bin/bash
#SBATCH --ntasks 1
#SBATCH --nodes 1
#SBATCH --time 00-01:00:00
#SBATCH --array=0-15

dir="TEST_$SLURM_ARRAY_JOB_ID"
org=base-case
dst=$dir/case-$SLURM_ARRAY_TASK_ID

#step 0 -> I'd like that this step was executed only by one task!
srun mkdir $dir

#step 1
srun cp -r $org $dst

#step 2
srun python createParamsFile.py $dst $SLURM_ARRAY_TASK_ID

#step 3
srun python simulation.py $dst

I'd like to run step 0 just once, since the rest of the jobs in the array will share the directory created. It is not a big deal, because once the directory is created the remaining attempts raise an error on creating the directory. But it is always better to avoid error messages in the logs and slurm steps abortions Per example in this case:

/usr/bin/mkdir: cannot create directory 'TEST_111224': File exists
srun: error: s02r3b83: task 0: Exited with exit code 1
srun: Terminating job step 111226.0

It is true that if I execute the mkdir command without the srun, step 0 does not exist and it is not terminated abruptly. But I still get the error.

Bub Espinja
  • 4,029
  • 2
  • 29
  • 46

1 Answers1

1

Use the -p option of mkdir so that mkdir only creates the directory if not already present, and you will not have the errors in the log.

srun mkdir -p $dir

Note that removing srun in your case will not change anything as only one task per job is requested (--ntasks=1). The error is not because many tasks in a job create the same directory, but because many jobs in an array create the same directory.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
  • Thanks, I wanted to use a more elegant solution for this issue, but if it does not exist, it is fine. – Bub Espinja May 06 '20 at 11:57
  • It depends what you find elegant. It is easy to get that command to be executed by one of the scripts only, based on the value of the `SLURM_ARRAY_TASK_ID` env var, but you never know which job of the array will start first. Otherwise you can create the directory when you submit the job rather than in the submission script – damienfrancois May 06 '20 at 17:11