3

How do i run one program with different parameters in parallel on multiple nodes with SLURM? Example, I want to run:

prog a1.txt
prog a2.txt
prog a3.txt
...
prog an.txt

on m, m<n, cluster nodes but only one at a time, that is when prog ai.txt is run on node j, no other prog aj.txt (an instance of prog) is executed until this one is finished. Each instance of prog ai.txt will use k cores on the given node during some part of its execution. So inititially:

prog a1.txt runs on node 1
prog a2.txt runs on node 2
...
prog am.txt runs on node m

and once prog a1.txt ends on node 1, prog am+1.txt will run on node 1 etc.

Ideally, i would like to be able to achieve this with a SLURM script.

I have already asked similar question here but without an answer that i would understand and the documentation does not provide a "SLURM guide to an idiot". To avoid comments such as "why do i want to run it on m nodes?", this is what i am allocated, this is what i want, running it on less is not using all the resources, and i cannot run it on more. The important thing is that SLURM MUST NOT assign 2 or more of those instances to a given node at the same time even if the prog on that node is at that moment using only one core, this I cannot stress enough, this is what i want to achieve.

Gilles Gouaillardet
  • 8,193
  • 11
  • 24
  • 30
atapaka
  • 1,172
  • 4
  • 14
  • 30

0 Answers0