0

I am running array jobs on slurm, so every job needs to copy a file from a local directory to a temporary one. This cp should not occur simultaneously.

This is the code I came up with:

mydirectory=mydb
LOCKFILE_1=${mydirectory}.lock
set -e
(
    flock -w 3600 200 # Wait for the lockfile for max. 1 hour (3600 s), to not block the queue forever in case of dead lock files.
    cp -r ${mydirectory} $TMPDIR/newdestinationdirectory
) 200>$LOCKFILE_1
set +e

Is this code doing the right thing? Or do I need

rm -f $LOCKFILE_1

for removing the lockfile again?

Saraha
  • 144
  • 1
  • 12
  • 1
    `This cp should not occur simultaneously`. why? You can read concurrently. You can't read&write at the same time - but there is no write shown here, so until the destination is unique for each job, just copy in parallel. – KamilCuk Sep 09 '20 at 11:12
  • 1
    The sysadmin might get mad (he did so in the past...) – Saraha Sep 09 '20 at 13:13

1 Answers1

2

If I understand correctly, you want to limit the load on the file system and network. There is Slurm's sbcast command which is for such cases. You can only copy single files, so you should tar the directory before broadcasting to all nodes:

tar cf ${mydirectory}.tar $mydirectory
sbcast ${mydirectory}.tar $TMPDIR/
srun -n ${SLURM_JOB_NUM_NODES} --ntasks-per-node=1 "tar xf ${mydirectory}.tar -C $TMPDIR/"

This can only be done inside a job allocation (e.g. inside a jobscript)!

Marcus Boden
  • 1,495
  • 8
  • 11