3

I have started to read Nexflow's documentation and found that one can specify a scratch directory for the execution. Once the task is complete, one can use the stageOutMode directive to copy the output files from scratch to storeDir.

The output files to be copied are specified by the output directive. My question is the following: is it possible to specify entire directories as output so that they would be copied recursively from scratch to storeDir? If so, how?

Botond
  • 2,640
  • 6
  • 28
  • 44

1 Answers1

3

By default, the path output qualifier will capture process outputs (files, directories, etc) recursively. All you need to do is specify the (top-level) directory in your output declaration, like in the example below:

nextflow.enable.dsl=2

process test {

    scratch '/tmp/my/path'

    stageOutMode 'copy'

    storeDir '/store/results'

    input:
    val myint

    output:
    path "outdir-${myint}"

    script:
    def outdir = "outdir-${myint}/foo/bar/baz"

    """
    mkdir -p "${outdir}" 
    touch "${outdir}/${myint}.txt"
    """
}

workflow {

    ch = Channel.of( 1..3 )

    test(ch)
}

Setting the stageOutMode directive just changes the how the output files are staged out from the scratch directory to the work directory. I.e. this directive does not change how process results are staged into the storeDir directory.

The storeDir directive changes what finally happens to the files listed in the output declaration such that they are moved from the work directory into the specified storeDir directory.

Steve
  • 51,466
  • 13
  • 89
  • 103
  • thank you. Please allow me one more question: if I define `scratch '$tmppath'`, and execute the script on a SLURM cluster, will `$tmppath` be substituted at runtime? My scheduler allocates a temporary directory with the JOBID in its name which is only allocated when the job starts running. – Botond Sep 23 '21 at 15:59
  • 2
    @Botond Yes, as long as the variable is single quoted like you've got above. Our PBS Pro scheduler does the same thing and sets a variable called `$TMPDIR` which will point to something like `/scratch/pbs.36569355.hpcpbs01` when the job starts. This directory is only created when the job starts and will be cleaned up automatically when it finishes. Nextflow will try to write all output in a sub-directory using a unique id, for example: `/scratch/pbs.36569355.hpcpbs01/nxf.6S4nL9mY3p/outdir-1/foo/bar/baz/1.txt`. – Steve Sep 23 '21 at 23:51