2

I have a computing cluster with four nodes A, B, C and D and Slurm Version 17.11.7. I am struggling with Slurm array jobs. I have the following bash script:

#!/bin/bash -l
#SBATCH --job-name testjob
#SBATCH --output output_%A_%a.txt
#SBATCH --error error_%A_%a.txt
#SBATCH --nodes=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=50000

FOLDER=/home/user/slurm_array_jobs/
mkdir -p $FOLDER
cd ${FOLDER}

echo $SLURM_ARRAY_TASK_ID > ${SLURM_ARRAY_TASK_ID}

The script generates the following files:

  • output_*txt,
  • error_*txt,
  • files named according to ${SLURM_ARRAY_TASK_ID}

I run the bash script on my computing cluster node A as follows

sbatch --array=1-500 example_job.sh

The 500 jobs are distributed among nodes A-D. Also, the output files are stored on the nodes A-D, where the corresponding array job has run. In this case, for example, approximately 125 "output_" files are separately stored on A, B, C and D.

Is there a way to store all output files on the node where I submit the script, in this case, on node A? That is, I like to store all 500 "output_" files on node A.

ManuelAllh
  • 63
  • 5

2 Answers2

1

Slurm does not handle input/output files transfer and assumes that the current working directory is a network filesystem such as for instance NFS for the simplest case. But GlusterFS, BeeGFS, or Lustre are other popular choices for Slurm.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
0

Use an epilog script to copy the outputs back to where the script was submitted, then delete them.

Add to slurm.conf:

Epilog=/etc/slurm-llnl/slurm.epilog

The slurm.epilog script does the copying (make this executable by chmod +x):

#!/bin/bash

userId=`scontrol show job ${SLURM_JOB_ID} | grep -i UserId | cut -f2 -d '=' | grep -i -o ^[^\(]*`
stdOut=`scontrol show job ${SLURM_JOB_ID} | grep -i StdOut | cut -f2 -d '='`
stdErr=`scontrol show job ${SLURM_JOB_ID} | grep -i StdErr | cut -f2 -d '='`
host=`scontrol show job ${SLURM_JOB_ID} | grep -i AllocNode | cut -f3 -d '=' | cut -f1 -d ':'`
hostDir=`scontrol show job ${SLURM_JOB_ID} | grep -i Command | cut -f2 -d '=' | xargs dirname`
hostPath=$host:$hostDir/

runuser -l $userId -c "scp $stdOut $stdErr $hostPath"
rm -rf $stdOut
rm -rf $stdErr

(Switching from PBS to Slurm without NFS or similar shared directories is a pain.)

Dharman
  • 30,962
  • 25
  • 85
  • 135
Chris81
  • 41
  • 3