0

I'm still exploring how to work with the Slurm scheduler and this time I really got stuck. The following batch script somehow doesn't work:

#!/usr/bin/env bash

#SBATCH --job-name=parallel-plink
#SBATCH --mem=400GB
#SBATCH --ntasks=4

cd ~/RS1
for n in {1..4};
do
  echo "Starting ${n}"
  srun --input none --exclusive --ntasks=1 -c 1 --mem-per-cpu=100G plink --memory 100000 --bfile RS1 --distance triangle bin --parallel ${n} 4 --out dt-output &
done

Since most of the SBATCH options are inside the batch script the invocation is just: 'sbatch script.sh'

The slurm-20466.out only contains the four echo'ing outputs: cat slurm-20466.out

Starting 1
Starting 2
Starting 3
Starting 4

I double checked the command without srun and that works without errors.

I must confess I am also responsible for the Slurm scheduler configuration itself. Let me know if I could try to change anything or when more information is needed.

mve
  • 3
  • 2

1 Answers1

0

You start your srun commands in the background to have them run in parallel. But you never wait for the commands to finish.

So the loop runs through very quickly, echoes the "Starting ..." lines, starts the srun command in the background and afterwards finishes. After that, your sbatch-script is done and terminates successfully, meaning that your job is done. With that, your allocation is revoked and your srun commands are also terminated. You might be able to see that they started with sacct.

You need to instruct the batch script to wait for the work to be done before it terminates, by waiting for the background processes to finish. To do that, you simply have to add a wait command in your script at the end:

#!/usr/bin/env bash

#SBATCH --job-name=parallel-plink
#SBATCH --mem=400GB
#SBATCH --ntasks=4

cd ~/RS1
for n in {1..4};
do
  echo "Starting ${n}"
  srun --input none --exclusive --ntasks=1 -c 1 --mem-per-cpu=100G plink --memory 100000 --bfile RS1 --distance triangle bin --parallel ${n} 4 --out dt-output &
done
wait
Marcus Boden
  • 1,495
  • 8
  • 11
  • Thanks, very helpful! For the record - I was only interested in the problem of using sruns inside salloc or sbatch. I wouldn't recommend to use this actual tool in this way! – mve Jul 06 '22 at 13:07