0

I am looking for a simple way to run a program with various input files using slurm. I want to run each instance of the program on a single node so that it can make use of openMP. I found that probably the best way would be to use job arrays. But I am not sure how to tell SLURM to use 10 nodes and on each node, only one instance of the process will be running using multiple openMP threads. I was thinking of something like this in the batch file i would submit to slurm. Is that correct and will that do what I need? Is there some other way perhaps?

#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
#SBATCH --exclusive
#SBATCH --array=1-100
./prog --input file${SLURM_ARRAY_TASK_ID}.in
atapaka
  • 1,172
  • 4
  • 14
  • 30
  • please fix "use 10 nodes and on each node". Also, is this a MPI question? – Gilles Gouaillardet Jun 30 '20 at 01:25
  • @GillesGouaillardet Sorry, I removed the MPI tag, that was indeed probably incorrect. The sentece I think is correct, it reads: "how to tell SLURM to use 10 nodes and on each node, only one instance of the process will be running using multiple openMP threads". – atapaka Jun 30 '20 at 01:55
  • 1
    you have to ask one node in your job script (and probably get rid of `--ntasks-per-node` and --cpus-per-task` since you ask exclusive access to the node). SLURM will by default try to run as much (sub)jobs in parallel as possible, so I am not sure of what you mean by "use 10 nodes". You can **limit up to** 10 subjobs in parallel with `#SBATCH --array=1-100%10` but I do not think you can require a minimum number of nodes. – Gilles Gouaillardet Jun 30 '20 at 02:13
  • @GillesGouaillardet I want to execute the commands that are made by that array on 10 nodes. On each node the command will be running in OpenMP. As an example, if I wanted to use only 2 nodes, I would be running `./prog --input file1` on node 1 and at the same time slurm should run `./prog --input file2` on node 2, when one of the nodes finished, say node 2, it would run `./prog --input file3` on node 2 and when node 1 finishes slurm would run `./prog --input file4` on node 1 and so forth until all part of array are done. – atapaka Jun 30 '20 at 02:49
  • So I would like to use in general n nodes concurrently, each running 1 instance of the `prog` which itself uses as many cores as i tell it to on the given node. And when it finishes new instance of the `prog` is started until the `prog` is run with all input files i need. It is basically `mpi` like code without `MPI` since i have no need for mpi communications and all the individual runs are independent. – atapaka Jun 30 '20 at 02:51
  • my point is why do you want to use **10** instead of **any** number of nodes? – Gilles Gouaillardet Jun 30 '20 at 03:18
  • 2
    What is the point of running on exactly 10 nodes in parallel if you have 100 input files to process? As Gilles points out, `--array=1-100%10` will start an array job of 100 tasks and SLURM will not run more than 10 of them simultaneously. If you just say `--array=1-100`, then SLURM will run any number of tasks simultaneously. The end result in both cases will be exactly the same - your program will run 100 times with different input files. – Hristo Iliev Jun 30 '20 at 11:31
  • 2
    It is possible to achieve exactly what you want by submitting a 10-node job and askng `srun` to launch one process per node. That process can be a wrapper shell script that uses the environment variables set by SLURM to figure out which input file to pass to `prog`. – Hristo Iliev Jun 30 '20 at 11:36
  • @HristoIliev I apologize but I do not think i understand you. I have little experience with cluster computing. The cluster has many nodes and I want to be able to specify how many to use at the same time. The more I would use the more I need to wait for that amount to become available. 10 was just an example, it can be n. What I need to make sure though that slurm does not run `prog` twice on a given node, since on a node I want to use openmp shared memory. – atapaka Jul 02 '20 at 00:26
  • If you say to do `--array=1-100%10` what effect will the other parameters will have?`#SBATCH --nodes=10; #SBATCH --ntasks-per-node=1; #SBATCH --cpus-per-task=12` – atapaka Jul 02 '20 at 00:26
  • 2
    @leosenko I think you misunderstand job arrays. a job array of 100 subjobs is basically a convenient way of submitting 100 individual and single node jobs. From a scheduler point of view, this is **not** submitting one job on 100 nodes. – Gilles Gouaillardet Jul 02 '20 at 00:31
  • 1
    When you submit an array job, the scheduler will start as many simultaneously running tasks as possible, but no more than the limit specified after the `%` sign. At moments, the running count may be lower, if there aren't enough resources. Yes, you lose some predictability as to what time it will take to run all the tasks, but you win simplicity of the whole procedure and play nice with the rest of the cluster users. – Hristo Iliev Jul 02 '20 at 07:03
  • @GillesGouaillardet And so what would you suggest instead? I could rewrite the FORTRAN and hardcode the variation of input parameters there and then make it MPI for the parameter variation part so it would run as MPI with OpenMP, but I feel there is simpler way (the `prog` ran with input files/parameters do not interact and I am also lazy). However, I need to also be able to control resources (number of nodes for scheduling purposes and processes per node to make sure that openMP will be able to use all available cores and is not throttled by another instance of `prog` on a given node). – atapaka Jul 02 '20 at 14:28
  • And probably even have to select specific nodes since in the cluster, there are nodes with various physical characteristic, specifically memory size. – atapaka Jul 02 '20 at 14:29
  • Unless you can provide a strong rationale for running on exactly 10 nodes, I suggest you do what I wrote earlier. If you really want to run on 10 nodes, you do not need MPI, do what Hristo said, `srun` and a wrapper will do the trick. – Gilles Gouaillardet Jul 02 '20 at 15:15

0 Answers0