I want to submit an array job to slurm with a 100 tasks, each using just one cpu. I have access to a cluster with 10 nodes and 24 cores each with hyperthreading activated. I am limiting the number of concurrent jobs with --array=1-100%24
trying to keep all jobs in a single node and leave the rest of the cluster free for other users, but the 24 tasks are executed in an arbitrary number of nodes. I've tried --nodes=1
or --distribution=block:block
to override the cyclic distribution, both unsuccessfully: the 24 simultaneous tasks run in more than one node.
Browsing in stackoverflow I've seen an older question that solved it by giving a list of nodes to exclude. It works for me, but I think it defeats the idea of having a job scheduler to optimize the cluster usage.
Here's an example script I'm using to solve this.
Thanks a lot, Pablo
#!/bin/sh
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=1
#SBATCH --output=output/test.log_%A_%a.out
#SBATCH --error=output/test.log_%A_%a.err
#SBATCH --array=1-100%48
#SBATCH --distribution=block:block
#SBATCH --nodes=1
# Display all variables set by slurm
env | grep "^SLURM" | sort
# Print hostname job executed on.
echo
echo "My hostname is: $(hostname -s)"
echo
sleep 30