Slurm Workload Manager (formerly known as Simple Linux Utility for Resource Management or SLURM), or Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.
Questions tagged [slurm]
45 questions
1
vote
1 answer
Allow other users to cancel jobs
I have a test cluster with Slurm in which I would like that other users where able to cancel other users' jobs.
By default, the users are able to cancel their own jobs.
How can I define several administrators?
My Slurm configuration…

Bub Espinja
- 111
- 3
1
vote
1 answer
How do I prevent additional jobs from a given user from starting?
With the Slurm workload manager how can I prevent more jobs from user bob from starting? Existing jobs should continue to run. The user should be able to submit more jobs but they shouldn't be able to start.

Levi Morrison
- 111
- 5
1
vote
1 answer
How to upgrade Slurm?
I've been asked to upgrade our Slurm Workload Manager installation. I have a slurm 2.3.4 on a Debian 7.0 wheezy cluster (1 master + 8 nodes). I've not installed it so I'm a bit confused about how to do this and how to proceed without destroying…

Sasha Grievus
- 223
- 2
- 11
1
vote
0 answers
program on cluster exceeds RSS memory limit
I have been trying to run a python script on a computer cluster but keep running into a error saying that RSS memory limit exceeded.
I am using this program to analyse a data set consisting of around 40000 cases. I have tried it on my pc for 1000…

MSB
- 111
- 2
1
vote
0 answers
Slurm not filtering sacct results by date
We're using Slurm as a resource manager on our Beowulf cluster, so I installed Slurm on my workstation to test out my scripts before I submit them to the cluster.
When I try to list old jobs on my workstation, sacct won't filter them by date.
$…

Don Kirkby
- 1,354
- 3
- 11
- 23
1
vote
1 answer
ssh directly into a specific node on a cluster, without first ssh into login node?
I usually log on to a cluster, start a slurm interactive job, then I am able to ssh into specific running nodes.
My questions is, is it generally possible to ssh into a specific node from my local machine, without first ssh-ing into a login node? I…

georg
- 111
- 2
1
vote
1 answer
Computer cluster admin: how to limit users running program but permit file transferring
I am managing a small computer cluster with slurm on CentOS 7. I want to discourage users to run programs on login node. This can be achieved by adding user hard cpu 1 to file /etc/security/limits.conf. However, I do not want file transferring…

wdg
- 153
- 1
- 5
1
vote
1 answer
slurm nvidia-docker ignores CUDA_VISIBLE_DEVICES
I have a problem running nvidia-docker containers on a slurm cluster. When inside the container all gpus are visible so basically it ignores the CUDA_VISIBLE_DEVICES set env by slurm. Outside the container the visible gpus are correct.
Is there a…

JohnA.Zoidberg
- 13
- 3
1
vote
1 answer
Wrong LDAP user ID is mapped into Slurm account management service
I configured a Slurm head node as follows:
sssd to contact openLDAP
slurmctld/slurmdbd/slurmd/munged to act as the Slurm controller and compute node
...where ray.williams is an LDAP user. Its UID can be mapped on the node. SSH login works…

Nicolas De Jay
- 209
- 2
- 11
1
vote
0 answers
View/request instruction sets available on SGE host
How can I view or request hosts that can handle a particular instruction set in SGE?
With Slurm, to view available instruction sets on each host I can use
sinfo --Node -o '%n %f',
and to submit a batch job only to, e.g., hosts with the AVX2…

Richard Border
- 111
- 3
1
vote
0 answers
Slurmd remains inactive/failed on start
I currently have a cluster of 10 worker nodes managed by Slurm with 1 master node. I have previously successfully set up the cluster, after some teething problems, but managed to get it working. I put all my scripts and instructions on my GitHub…

Brett Chapman
- 11
- 1
1
vote
0 answers
Slurm Error: “If using PrologFlag=Contain for pam_slurm_adopt, either proctrack/cgroup or proctrack/crau_aries is required.”
I'm using the flag x11 (PrologFlags=x11) in my slurm.conf file and jobs with x11 works perfectly, but I am getting this error every time I run a slurm command (e.g. sbatch, srun, sacctmgr):
scontrol: error: If using PrologFlag=Contain for…

Ricardo Barbosa
- 11
- 1
1
vote
1 answer
Single-node SLURM server: restrict interactive CPU usage
I have SLURM setup on a single node, which is also a 'login node'. I would like to restrict interactive CPU usage, e.g. outside the scheduling system.
I found the following article which suggests to use cgroups for this:…

Compizfox
- 384
- 1
- 6
- 18
0
votes
1 answer
slurm salloc and how it get user login
Some theoretical question. I understand that the better way to know is to look at the code, but maybe I can do some cheat and just ask about it?
I wonder that after salloc user can log in to the node.
How does it work? Does salloc add user to…

Black S.
- 35
- 3
0
votes
1 answer
Trouble Installing slurm on Fedora 29
When I run slurmd, it gives a -bash: slurmd: command not found.
I ran sudo yum install slurm to install slurm. I don't know why it isn't working, or if I installed all the required packages for slurm.

user3273814
- 213
- 2
- 3
- 8