1

I am new to Slurm and I trying to launch several executables to run in parallel (in the example below it is just the date command). I would like them to start at different times, separated by a short time delay.

I have made a few attemps, trying to add additional lines in between the sruns, such as "srun sleep 5s &" or with the "--begin" option shown below. In particular, the "--begin" option fails saying that "--begin is ignored because nodes are already allocated".

The parallel module seems not to be available in our cluster.

#!/bin/bash
#SBATCH --output=parallel_test_%j.out   # Standard output and error log
#SBATCH --time=06:00:00
#SBATCH --nodes=1   # number of nodes
#SBATCH --ntasks=6   
#SBATCH --mem-per-cpu=1024M   # memory per CPU core

srun="srun -n1 -N1 --exclusive"
# --exclusive     ensures srun uses distinct CPUs for each job step
# -N1 -n1         allocates a single core to each task


$srun date &
$srun --begin=now+3 date &
$srun --begin=now+6 date &
$srun --begin=now+9 date &
$srun --begin=now+12 date &
$srun --begin=now+15 date &
wait

The output I get is the following:

srun: error: --begin is ignored because nodes are already allocated.
srun: error: --begin is ignored because nodes are already allocated.
srun: error: --begin is ignored because nodes are already allocated.
srun: error: --begin is ignored because nodes are already allocated.
srun: error: --begin is ignored because nodes are already allocated.
Sun Jun 23 14:07:05 PDT 2019
Sun Jun 23 14:07:05 PDT 2019
Sun Jun 23 14:07:05 PDT 2019
Sun Jun 23 14:07:05 PDT 2019
Sun Jun 23 14:07:05 PDT 2019
Sun Jun 23 14:07:06 PDT 2019

What I would like to obtain is the following output:

Sun Jun 23 13:22:54 PDT 2019
Sun Jun 23 13:22:57 PDT 2019
Sun Jun 23 13:23:00 PDT 2019
Sun Jun 23 13:23:03 PDT 2019
Sun Jun 23 13:23:06 PDT 2019
Sun Jun 23 13:23:09 PDT 2019

Thank you for your help

Vincent
  • 105
  • 7
  • [ShellCheck](https://www.shellcheck.net) correctly points out that your `srun` variable is unused. Did you mean `$srun date &`? – that other guy Jun 23 '19 at 21:01
  • Thank you for your help, I was definitely missing the $ symbol. Still, I am not getting the output desired... – Vincent Jun 23 '19 at 21:11
  • You can set up delays between srun calls (`sleep 5`). But what is the rational for this? It does not seem to be useful at all. – Poshi Jun 25 '19 at 12:31

1 Answers1

0

In this case, --begin will be of no help because it is used to defer the initiation of the job, and the job already started when srun is run in the submission script.

You can get the requested behaviour like this:

$srun date &
sleep 3; $srun date &
sleep 3; $srun date &
sleep 3; $srun date &
sleep 3; $srun date &
sleep 3; $srun date &
wait

or even like this

$srun date &
$srun bash -c "sleep 3 ; date" &
$srun bash -c "sleep 6 ; date" &
$srun bash -c "sleep 9 ; date" &
$srun bash -c "sleep 12 ; date" &
$srun bash -c "sleep 15 ; date" &
wait

Regarding

The parallel module seems not to be available in our cluster

that does not mean you cannot install it by yourself (See this question). If Easybuild is installed on your cluster, it is even easier. (If it is not, you can also install it by yourself) Then you can use the --delay option.

parallel --delay 3 $srun date
damienfrancois
  • 52,978
  • 9
  • 96
  • 110