using parallel slurm jobs in combination with mpirun with multiple nodes

Question

I am asking for a solution of an issue I do not get behind. I am using a slurm cluster and have a python script 'SOLVER.py', which uses mpirun commands in its code (calls a numerical spectral-element simulation). Each node on the cluster consists of 40 processors. As an example, I would like to call 5 nodes and run the script 'solver.py' on each node (5 times, in parallel) with 40 processors each.


#!/bin/bash
#
#SBATCH --job-name=solv
#SBATCH --comment="example"
#SBATCH --exclusive
#SBATCH --workdir=.  ### directory where the job should be executed from
#SBATCH --mem=150000        ### RAM in MB required for the job (this is mandatory!!!)
#SBATCH --nodes=1               ### Node count required for the job
#SBATCH --exclusive
#SBATCH --output=slurm.%j.out       ### output file for console output
#SBATCH --partition=long    ### partition where the job should run (short|medium|long|fatnodes)
# ...

#export OMP_NUM_THREADS=160
python SOLVER.py

.. works fine. Now what is the correct method to run the script 5 times on 5 nodes? I tried many different things, varying ntasks, different srun combinations and a plugin called jug, but always get different problems.

Could someone help me? :)

Best regards,

Max

@Stefan: Thanks for your answer. You mean calling 5 times sbatch ./myscript.sh? This is because the system is restricted to a certain number of jobs (but not to a number of nodes, which is kind of stupid I think..) :( Any idea how I can put that in one script/one job? — Max T, Nov 16 '20 at 15:55
@GillesGouaillardet: Thank you too for your answer! I think this would have the same effect as the propoal of Stefan above, or not? — Max T, Nov 16 '20 at 15:57
This is site specific and you **might** be able to harvest more nodes with one job array of n (sub)jobs vs n independent single node jobs. Do I recommend you give it a try and see how it goes. — Gilles Gouaillardet, Nov 16 '20 at 23:52
@GillesGouaillardet I tried it now, unfortunately it is the same problem.. The partionated jobs are limited. Is there some other way? I think it should also work with a combination of ntasks and srun, but I could not figure it out yet.. — Max T, Nov 17 '20 at 09:14
you can manually split your list of nodes into 5 non overlapping `machinefile`, and then run in parallel 5 instances of `mpirun -machinefile machinefile.x ...`, giving each `mpirun` its unique `machinefile`. — Gilles Gouaillardet, Nov 19 '20 at 00:40

using parallel slurm jobs in combination with mpirun with multiple nodes

0 Answers0