I'm using SLURM sbatch
to launch a bunch of parallel tasks in a cluster. The total amount of cores that I need to run all tasks in parallel exceeds the total amount of cores that my sbatch script asks for, so some job steps won't run until others have finished.
Here's an example script that reflects my use case: let's say each node in the cluster has 40 cores, I use sbatch
to allocate 10 nodes, so 400 cores at my disposal. But I have 12 tasks to run, and each of my tasks is asking for 40 cores, so they need a total of 480 cores to run in parallel.
#!/bin/bash
#SBATCH --cpus-per-task=40
#SBATCH --nodes=10
#below is a total of 12 invocations of srun
srun --cpus-per-task=40 --nodes=1 --ntasks=1 --job-name=first <executable> &
srun --cpus-per-task=40 --nodes=1 --ntasks=1 --job-name=second <executable> &
...
srun --cpus-per-task=40 --nodes=1 --ntasks=1 --job-name=twelfth <executable> &
wait
My problem is, sacct
won't show the status of all twelve job steps until all invocations of srun
can get the resource they need. How can I adjust my way of using SLURM, so that immediately after I submit my batch script, I can inspect the state of all "twelve" job steps?
Here's my current way of operation:
Call sbatch <the script above>
, and then call sacct -j <JobID>
. At first, only ten job steps will show up in the output, all in running state:
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
XXX script batch (null) 0 RUNNING 0:0
XXX.0 first (null) 0 RUNNING 0:0
XXX.1 second (null) 0 RUNNING 0:0
XXX.2 third (null) 0 RUNNING 0:0
XXX.3 fourth (null) 0 RUNNING 0:0
XXX.4 fifth (null) 0 RUNNING 0:0
XXX.5 sixth (null) 0 RUNNING 0:0
XXX.6 seventh (null) 0 RUNNING 0:0
XXX.7 eighth (null) 0 RUNNING 0:0
XXX.8 nineth (null) 0 RUNNING 0:0
XXX.9 tenth (null) 0 RUNNING 0:0
... and logfile slurm-.out would tell me: srun: Job XXX step creation temporarily disabled, retrying (Requested nodes are busy)
When one job step finally completes, the logfile will print a new line: srun: Step created for job XXX
and the output of sacct -j <JobID>
will look like this (note there are eleven job steps now):
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
XXX script batch (null) 0 RUNNING 0:0
XXX.0 first (null) 0 RUNNING 0:0
XXX.1 second (null) 0 RUNNING 0:0
XXX.2 third (null) 0 RUNNING 0:0
XXX.3 fourth (null) 0 RUNNING 0:0
XXX.4 fifth (null) 0 RUNNING 0:0
XXX.5 sixth (null) 0 RUNNING 0:0
XXX.6 seventh (null) 0 RUNNING 0:0
XXX.7 eighth (null) 0 COMPLETED 0:0
XXX.8 nineth (null) 0 RUNNING 0:0
XXX.9 tenth (null) 0 RUNNING 0:0
XXX.10 eleventh (null) 0 RUNNING 0:0
It could be possible I was missing on some options as the manual for SLURM is really unwieldy. I've already read How to know the status of each process of one job in the slurm cluster manager?, but that does not solve my problem.
I appreciate suggestions on how to solve my problem, or how to use SLURM in a "more correct" way.