I have a heterogeneous cluster, containing either 14-core or 16-core CPUs (28 or 32 threads). I manage job submissions using Slurm. Some requirements:
- It doesn't matter which CPU is used for a calculation.
- I don't want to specify which CPU a job should go to.
- A job should consume all available cores on the CPU (14 or 16).
- I want mpirun to handle threading.
To illustrate the peculiarities of the problem, I show a job script that works on the 16-core CPUs:
#!/bin/bash
#SBATCH -J test
#SBATCH -o job.%j.out
#SBATCH -N 1
#SBATCH -n 32
mpirun -np 16 vasp
An example job script that works on the 14-core CPUs is:
#!/bin/bash
#SBATCH -J test
#SBATCH -o job.%j.out
#SBATCH -N 1
#SBATCH -n 28
mpirun -np 14 vasp
The second job script runs on the 16-core CPUs but, unfortunately, the job is about 35% slower than when I request 32 threads as is done in the first script. That's an unacceptable performance loss for my application.
I haven't figured out if there is a good way around this challenge. To me, a solution would be to request a variable number of resources, such as
#SBATCH -n [28-32]
and to tailor the mpirun -np x vasp
line accordingly. I haven't found a way to do this, however. Are there any suggestions on how to achieve this directly in Slurm or is there a good workaround?
I tried to use the environmental variable $SLURM_CPUS_ON_NODE
, but this variable is only set after the node is selected, so cannot be used in a #SBATCH
line.
I also looked at the --constraint
flag but this does not seem to give sufficiently granular control over threading requests.