1

I am new to HPC and SLURM especially, and i ran into some troubles.

I was provided with acces to a HPC cluster with 32 CPUs on each node. In order to do the needed calculations I made 12 Python multiprocessing Scripts, where each Script uses excactly 32 CPU's. How, instead of starting each Script manually in the interactive modus ( which is also an option btw. but it takes a lot of time) I decided to write a Batch Script in order to start all my 12 Scripts automatically.

//SCRIPT//

#!/bin/bash

#SBATCH --job-name=job_name

#SBATCH --partition=partition

#SBATCH --nodes=1

#SBATCH --time=47:59:59

#SBATCH --export=NONE

#SBATCH --array=1-12

module switch env env/system-gcc module load python/3.8.5

source /home/user/env/bin/activate

python3.8 $HOME/Script_directory/Script$SLURM_ARRAY_TASK_ID.py

exit

//UNSCRIPT//

But as far as i understand, this script would start all of the Jobs from the Array on the same node and thus the underlying python scripts might start a "fight" for the available CPU's and thus slow down.

How should i modify my bash file in Order to start each task from the array on a separate node?

Thanks in advance!

FuzzyData
  • 13
  • 3
  • UPDATE: i also added . /sw/batch/init.sh as a first command to the initial script in order to enable module load – FuzzyData Jun 04 '21 at 18:51

1 Answers1

1

This script will start 12 independent jobs, possibly on 12 distinct nodes at the same time, or all 12 in sequence on the same node or any other combination depending on the load of the cluster.

Each job will run the corresponding Script$SLURM_ARRAY_TASK_ID.py script. There will be no competition for resources.

Note that if nodes are shared in the cluster, you would add the --exclusive parameter to request whole nodes with their 32 CPUs.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110