I am quite new to Slurm and this community, so plese correct me in any way if I am doing anything wrong! :)
I need to run my executable (a Python script) many times in parallel on a HPC Cluster. This executable takes the Slurm Array task ID as Input. This input is mapped within the Python script onto several parameters, on basis of which in turn again data is imported. Note that the exectutable itself is not internally parallelised. I think that each invocation of my executable should be able to run on one CPU only.
My aim: run many invocations of my executable as many times as possible! I was thinking at least like 50 invocations concurrently.
In principle, my scripts are working as intended on the cluster. I use this Slurm submission script:
#!/bin/bash -l
#SBATCH --job-name=NAME
#SBATCH --chdir=/my/dir
#SBATCH --output=.job/NAME%A_%a.out
#SBATCH --error=.job/NAME%A_%a.err
#SBATCH --mail-type=END
#SBATCH --mail-user=USER
# --- resource specification ---
#SBATCH --partition=general
#SBATCH --array=1-130
#SBATCH --ntasks-per-node=1
#SBATCH --mem=16G
#SBATCH --time=13:00:00
# --- start from a clean state and load necessary environment modules ---
module purge
module load anaconda/3
# --- instruct OpenMP to use the number of cpus requested per task ---
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
# --- run executable via srun ---
srun ./path/to/executable.py $SLURM_ARRAY_TASK_ID
However, this way, somehow only 8 jobs (that is, 'executable.py 1', 'executable.py 2', ...) get executed in parallel, each on a different node. (Note: I don't quite know what 'export OMP_NUM_THREADS' does; I was told to include it by IT support). If 'executable.py 1' ends, 'executable.py 9' starts. However, I want more than just 8 concurrently running invocations. So I thought, I need to specify that each inovcation only needs one CPU; maybe then many more of my jobs can run in parallel on the 8 nodes I somehow seem to receive. My new submission script looks like this (for readability I only show the 'resource specification' part, the rest was not changed):
# --- resource specification ---
#SBATCH --partition=general
#SBATCH --array=1-130
#SBATCH --ntasks-per-node=10
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --time=13:00:00
This way, though, it seems that my executable gets run ten times for each Slurm array task ID, that is, 'executable.py 1' is run ten times, as is 'executable.py 2' and so on. This is not what I intended.
I think at the bottom of my problem is that (i) I am serioulsy cofused by the SBATCH options --ntasks-per-node, --ntasks, --cpus-per-task, --nodes, etc., and (ii) I don't know conceptually really what a 'job', 'job step' or 'task' is meant to be (both, for my case as well as on the man page for SBATCH).
If anyone knows which SBATCH option combination gives me what I want, I would be very grateful for a hint. Also, if you have general knowledge (in plain English) on how job steps and tasks etc. can be defined, that would be so great.
Please note that I extensively stared at the man pages and some online documentations. I also asked my local I support, but sadly they were not awfully helpful. I really need my script to run in parallel on a huge scale; I also really want to understand a bit better the workings of Slurm. I shall like to add that I am not a computer scientist by training, this is not my usual playing field.
Thanks so much for your time everyone!