1

I'm used to start an sbatch script in a cluster where the nodes have 32 CPUs and where my code needs a power of 2 number of processors.

For exemple i do this:

#SBATCH -N 1
#SBATCH -n 16
#SBATCH --ntasks-per-node=16

or

#SBATCH -N 2
#SBATCH -n 64
#SBATCH --ntasks-per-node=32

However i now need to use a different cluster where each node has 40 CPUs. For the moment i'm using only one node and 32 processes to do testing:

#SBATCH --ntasks=32
#SBATCH --ntasks-per-node=32

(I got this later script from the documentation of the cluster. They don't use in this example the #SBATCH -N line, i don't know why but maybe because it is an example)

However i will now need to do larger simulations with 512 processors. The closer number of nodes i will need to use is 13 (ie 40*13=520 processors). Now the problem is that the number of task per node will not be (technically) an integer.

I think a solution will be to ask for 13 nodes where i will fully use 12 and only i will not fully use the last one.

My question is how do i do this?, Is there another way of doing this without changing the code? (It will not be possible to change the code, is a huge code).

And a simulation with 512 proc will take 10 hours minimum, so doing a larger simulation with 32 procs will take a week. And i don't only need one simulation but at least 20 for the moment.

Another solution will be to ask for 16 nodes (32*16=512) and only use 32 procs per node. However this will be a waste of processors and number of hours I'm allowed in the cluster.

Gundro
  • 139
  • 9
  • do not specify `--ntasks-per-node`. It is either redundant or incompatible with `-N` and `-n` – Gilles Gouaillardet Oct 29 '20 at 11:04
  • 3
    Just use solely `#SBATCH -n 512` and slurm will allocate you the minimum number of nodes you need to accommodate your job, and will also load-balance the processes between nodes so that each node gets as close as possible the same number of processes. You can for the sake of avoiding the risk of sharing nodes with other users, add `#SBATCH --exclusive` – Gilles Oct 29 '20 at 15:03
  • Ok, i will try to only use `-n 512`. That sounds logical, but because I'm new to using slurm i didn't know. Because in the examples of both cluster they use `--ntasks-per-node` i thought it was a must – Gundro Oct 30 '20 at 17:13
  • @Gilles It works, thank you. I erased the `--ntasks-per-node` line and left only the `-n 512` and it worked like a charm. How do i put your comment as answer? – Gundro Nov 05 '20 at 12:10
  • 1
    just write it as an answer yourself and accept it. I'm happy for you to get the corresponding reps :) – Gilles Nov 05 '20 at 14:30

1 Answers1

1

Ok the answer is simple but depends on the machine you are working. But i think it should work every time.

In the case of the second cluster i don't need to specify the line --ntasks-per-node=512. I just need to tell the machine how many tasks i need in total --tasks=512, automatically the machine will allocate the corresponding number of nodes necessary to do those tasks.

Important: If your ntasks is not a multiple of the processors per node, then the last node will be not completely used. For example in my case i need 512 tasks, this corresponds to 13 nodes = 520 processors. The first 12 processors are fully used but the last one is not and leaves 8 processors empty.

Note that this can cause some optimisation problems in some codes because the processes on the last node will need to communicate with the majority of processes in the other node(s). For me is not a problem but i know another code where this is a problem.

Gundro
  • 139
  • 9