I am trying to set-up configuration file on Ubuntu 20.04. I have tried several thing and searched for errors on other websites (link1, link2, link3) and slurm-website as well. Another similar question on SO as well.
Given the following information about my computer, what is the minimum required information must be provided in slurm.conf file.
The general information for my computer;
RAM: 125.5 GB
CPU: 1-20 (Intel® Xeon(R) CPU E5-2687W v3 @ 3.10GHz × 20 )
Graphics: NVIDIA Corporation GP104 [GeForce GTX 1080] / NVIDIA Corporation
OS: Ubuntu 20.04.2 LTS 64 bit
and I want to have 2 nodes with 10 CPUs for each and 1 node for GPU.
I have tried the followings;
After configuration and running the followings;
>sudo systemctl restart slurmctld
with no error. But I got error witj slurmd.
> sudo systemctl restart slurmd
Error is as below;
Job for slurmd.service failed because the control process exited with error code.
See "systemctl status slurmd.service" and "journalctl -xe" for details.
if I run "systemctl status slurmd.service
"
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2021-06-06 21:47:26 CEST; 1min 14s ago
Docs: man:slurmd(8)
Process: 52710 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
Here is my configuration file slurm.conf generated by configurator_easy.html and saved in /etc/slurm-llnl/slurm.conf
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=myhostname
#
AuthType=auth/menge
Epilog=/usr/local/slurm/epilog
Prolog=/usr/local/slurm/prolog
FirstJobId=0
InactiveLimit=120
JobCompType=jobcomp/filetxt
JobCompLoc=/var/log/slurm/jobcomp
KillWait=30
MinJobAge=300
MaxJobCount=10000
#PluginDir=/usr/local/lib
ReturnToService=0
SlurmdPort=6818
SlurmctldPort=6817
SlurmdSpoolDir=/var/spool/slurmd.spool
StateSaveLocation=/var/spool/slurm-llnl/slurm.state
SwitchType=switch/none
TmpFS=/tmp
WaitTime=30
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
SlurmUser=slurm
SlurmdUser=root
TaskPlugin=task/affinity
#
# TIMERS
SlurmctldTimeout=120
SlurmdTimeout=300
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
# LOGGING AND ACCOUNTING
#AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
#JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile
#SlurmdDebug=info
#SlurmdLogFile=
#
# COMPUTE NODES
NodeName=Linux[1-32] State=UP
NodeName=DEFAULT State=UNKNOWN
PartitionName=Linux[1-32] Default=YES