Questions tagged [slurm]

Slurm (formerly spelled SLURM) is an open-source resource manager designed for Linux HPC clusters of all sizes.

Slurm: A Highly Scalable Resource Manager

Slurm is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Slurm's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes (see Caos NSA and Perceus: All-in-one Cluster Software Stack by Jeffrey B. Layton) and was used by Intel on their 48-core "cluster on a chip". More complex configurations can satisfy the job scheduling needs of world-class computer centers and rely upon a MySQL database for archiving accounting records, managing resource limits by user or bank account, or supporting sophisticated job prioritization algorithms.

While other resource managers do exist, Slurm is unique in several respects:

It is designed to operate in a heterogeneous cluster counting over 100,000 nodes and millions of processors.
It can sustain a throughput rate of hundreds of thousands jobs per hour with bursts of job submissions at several times that rate.
Its source code is freely available under the GNU General Public License.
It is portable; written in C and using the GNU autoconf configuration engine. While initially written for Linux, other UNIX-like operating systems should be easy porting targets.
It is highly tolerant of system failures, including failure of the node executing its control functions.
A plugin mechanism exists to support various interconnects, authentication mechanisms, schedulers, etc. These plugins are documented and simple enough for the motivated end user to understand the source and add functionality.
Configurable node power control functions allow putting idle nodes into a power-save/power-down mode. This is especially useful for "elastic burst" clusters which expand dynamically to a cloud virtual machine (VM) provider to accommodate workload bursts.

Resources and Tutorials:

Name Spelling

As of v18.08, the name spelling “SLURM” has been changed to “Slurm” (commit 3d7ada78e).

Other Uses of the Name

Slurm also a fictional soft drink in the Futurama multiverse where it is popular and highly addictive.

1738 questions

votes

2 answers

Using sbatch to write to file

I'm new to slurm, and I'm trying to batch a shell script to write to a text file. My shell script (entitled "troublesome.sh") looks like this: #!/bin/bash #SBATCH -N 1 #SBATCH -n 1 echo "It worked!" When I run sh troublesome.sh > doeswork.txt it…

pipe slurm

asked Jun 24 '20 at 19:02

Kate

votes

1 answer

What is the difference between a normal user and a user created by sacctmgr for some account?

There are some users (listed in /etc/passwd) who can use Slurm to submit jobs in our cluster. But, with sacctmgr we can also define users belonging to some account(s). What should be the connection of these two group of users? Thanks.

slurm sacct

asked Jun 19 '20 at 11:43

potant

votes

0 answers

How do you import python modules with Slurm?

This is my shell script for sbatch: #!/bin/bash #SBATCH ..... #SBATCH ..... #SBATCH ..... module load python/3.7.1 srun python runcomb.py Im running a python script on a HPC which uses SLURM. For some reason, even basic python modules aren't being…

python-3.7 slurm

asked Jun 14 '20 at 09:27

Prajwal V Bharadwaj

votes

1 answer

Running MPI job on multiple nodes with slurm scheduler

I'm trying to run an MPI application with a specific task/node configuration. I need to run a total of 8 MPI tasks 4 of which on one node and 4 on another node. This is the script file I'm using: #!/bin/bash #SBATCH --time=00:30:00 #SBATCH…

mpi hpc slurm intel-mpi

asked Jun 08 '20 at 19:55

Federico Marchetti

votes

1 answer

How can I execute a SLURM script from within multiple directories simultaneously?

I want to execute a SLURM script from within multiple directories simultaneously. More specifically, I have ten array folders numbered array_1 through array_10 from which I want to execute the script. Within each of these directories, the script…

bash hpc slurm sbatch

asked Jun 01 '20 at 16:44

willford97

votes

0 answers

Mpi Bcast Error while using Calcul Canada

I am trying to run a calculation on the calcul canada remote site from my MAC. this is the input file run.sh I am using: #!/bin/sh #SBATCH --nodes=4 #SBATCH --ntasks-per-node=32 #SBATCH --time=24:00:00 #SBATCH --mem-per-cpu=2000M #SBATCH…

macos mpi slurm

asked May 28 '20 at 21:56

coralie

votes

1 answer

How to distribute custom code through SLURM manager?

I have access to a computer cluster with the SLURM manager. I want to achieve that different nodes execute different parts of my code. If I understood properly, this can be achieved through SLURM with the srun command if your code is properly…

slurm tensorflow2.x

asked May 28 '20 at 14:28

Nevena

votes

1 answer

How to run a python code with multiple inputs on a same node with slurm id?

I want to run a python program for 10 times and save different output files as output_1, output_2, output_3.....and so on. It can be run using 1 processor and 10 threads. I have access to 96 CPUs on a node, so, I want to perform all these 10 jobs in…

python parallel-processing cluster-computing hpc slurm

asked May 24 '20 at 03:12

Anirban Roy

votes

1 answer

compiler does not utilize all CPU, I need your advice

My PC have two cpu xeon e5-2678v3, 12 cores/24 thread each cpu (total 24core/48 threads) I submitted slurm batch job that request multicores for my code (CFD fortran code with intel fortran compiler in linux) The code run well but it seem that all…

linux multithreading slurm intel-fortran

asked May 19 '20 at 02:29

Anh Le DInh

votes

0 answers

Does the file get changed in squeue if I modify after being sent into queue?

I have a question: a have neural net file model.py with some parameters set. I have sent it to the slurm queue. When doing squeue I can see that it is still waiting because there are other jobs running. Now, I want to send another variation of…

pytorch virtual-machine torch slurm sbatch

asked May 18 '20 at 11:33

alienflow

votes

1 answer

Dereference error when accessing Slurm job resources using C API

I am trying to get memory usage information for each job in the Slurm cluster using C API: #include #include #include #include "slurm/slurm.h" #include "slurm/slurm_errno.h" int main(int argc, char** argv) { …

c api slurm

asked May 13 '20 at 12:03

mac13k

2,423
23
34

votes

1 answer

Job array step single execution

I have a sbatch script to submit job arrays to Slurm with different steps: #!/bin/bash #SBATCH --ntasks 1 #SBATCH --nodes 1 #SBATCH --time 00-01:00:00 #SBATCH…

slurm

asked May 06 '20 at 09:15

Bub Espinja

4,029
2
29
46

votes

1 answer

Has anyone successfully used shopt -s extglob (extended globbing) in bash with SBATCH settings on a HPC?

To summarise: I am using bash shell, version: 4.2.46(2)-release I want to submit a batch job script to slurm job scheduler where, in the script I use extended globbing which is turned on using shopt -s extglob on a separate line to the extended…

linux bash hpc slurm sbatch

asked Apr 29 '20 at 20:49

DarrenMoore95

votes

1 answer

Interpretation of output from sacct: meaning of ex+

I would like to know if a job is using one or two CPUs, based on the interpreation of the following sacct. I have searched documentation about the meaning of the ex+ row but found nothing: how should I interpret that row? JobID State …

slurm

asked Apr 26 '20 at 21:20

andrea m.

votes

1 answer

Hot to add extensions when running NetLogo headlessy on a cluster?

I am using a common Netlogo extension, "CSV", to read a table. The job fails because it cannot find the extension (although I am sure the extension file is present). How do I specify that I want to use an extension when working with Netlogo…

cluster-computing netlogo slurm

asked Apr 16 '20 at 17:26

Andrea

Prev 1 2 3

…

100 Next