I recently configure a slurm queing system for a server with one node and 72 cpus. Here the conf file:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine= hoffmann
##ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurm-llnl
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
# ---- Here to get jmore than one job running per node, seems to causedata transmission failure ----
#SelectType=select/cons_res
#SelectTypeParameters=CR_CPU_MEMORY
#SelectTypeParameters=
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=hoffmann
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/SlurmdLogFile
#
#
# COMPUTE NODES
NodeName=hoffmann CPUs=72 CoresPerSocket=18 ThreadsPerCore=2 State=UNKNOWN
PartitionName=queuing Nodes=hoffmann Default=YES MaxTime=INFINITE State=UP
It is running fine with the limitation it is allowing all cpus to each job regardless of what I am asking, the consequence being only one job can run at a time. Here the batch I am running:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=/home/ubuntu/test.out
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=500:00
sleep 50
echo 'done'
when I launch two of those and look at: sinfo -o "%all" I see all nodes are allocated. I guess I did a mistake in my conf file. Any idea what it can be? Thanks