How do i run one program with different parameters in parallel on multiple nodes with SLURM
?
Example, I want to run:
prog a1.txt
prog a2.txt
prog a3.txt
...
prog an.txt
on m
, m<n
, cluster nodes but only one at a time, that is when prog ai.txt
is run on node j
, no other prog aj.txt
(an instance of prog
) is executed until this one is finished. Each instance of prog ai.txt
will use k
cores on the given node during some part of its execution. So inititially:
prog a1.txt runs on node 1
prog a2.txt runs on node 2
...
prog am.txt runs on node m
and once prog a1.txt
ends on node 1
, prog am+1.txt
will run on node 1
etc.
Ideally, i would like to be able to achieve this with a SLURM script.
I have already asked similar question here but without an answer that i would understand and the documentation does not provide a "SLURM guide to an idiot". To avoid comments such as "why do i want to run it on m
nodes?", this is what i am allocated, this is what i want, running it on less is not using all the resources, and i cannot run it on more. The important thing is that SLURM
MUST NOT assign 2 or more of those instances to a given node at the same time even if the prog
on that node is at that moment using only one core, this I cannot stress enough, this is what i want to achieve.