1

I am working on a nextflow pipeline where a program used in a process generates all the results inside a subfolder.

I save the output files I need within an output tuple for later processes, and I am attempting at saving the output files in different directories depending on their name, changing their name too.

I am having trouble because the files don't get saved and I can't understand why. The issue must be in the way I use saveAs, but the nextflow documentation on it is very scarce (or I haven't found it).

I have the feeling that the issue comes from the fact that the $filename contains path information and isn't just a filename. Could anyone tell me what I'm doing wrong?

Here below, I wrote a mock process that reproduces the error:

#!/usr/bin/env nextflow 

Channel
.from("foo", "bar", "faz")
.set{ Input }

process mock {

    executor "local"
    maxForks 48
    cpus 1

    publishDir "${params.output_dir}",
    mode: "copy",
    pattern: "*.tab",
    saveAs: {
        filename -> 
             if (filename.contains("A.tab")) {"A/$filename"}
        else if (filename.contains("B.tab")) {"B/$filename"}
        else if (filename.contains("C.tab")) {"C/$filename"}
        else {"unassigned/$filename"}
    }

    input:
    val name from Input

    output:
    file "test/${name}.A.tab"
    file "test/${name}.B.tab"
    file "test/${name}.C.tab"
    file "test/${name}.D.tab"
    
    script:
    """
    mkdir test &&
    unset X &&
    declare -a X=(A.tab B.tab C.tab D.tab) &&
    for FILENAME in \${X[@]}
    do
        if [ ! -f test/${name}.\${FILENAME} ]
        then
            touch test/${name}.\${FILENAME}
        fi
    done
    """
}
schmat_90
  • 572
  • 3
  • 22

1 Answers1

3

I think the problem is that you're writing files to a directory called test, but with pattern: "*.tab" the publishDir directive only selects files to publish from the top level directory. You could try changing this to pattern: "test/*.tab" or pattern: "**.tab".

pattern

Specifies a glob file pattern that selects which files to publish from the overall set of output files.

An example using DSL2:

params.output_dir = './results'


process mock {

    publishDir (
        path: "${params.output_dir}/mock",
        mode: "copy",
        pattern: "test/*.tab",
        saveAs: { fn ->
            if (fn.endsWith("A.tab")) { "A/${fn}" }
            else if (fn.endsWith("B.tab")) { "B/${fn}" }
            else if (fn.endsWith("C.tab")) { "C/${fn}" }
            else { "unassigned/${fn}" }
        }
    )

    input:
    val name

    output:
    path "test/${name}.A.tab", emit: A
    path "test/${name}.B.tab", emit: B
    path "test/${name}.C.tab", emit: C
    path "test/${name}.D.tab", emit: D

    script:
    """
    mkdir test
    touch test/${name}.{A,B,C,D}.tab
    """
}
workflow {

    input_ch = Channel.of("foo", "bar", "faz")

    mock( input_ch )
}

Results:

$ find results/
results/
results/mock
results/mock/B
results/mock/B/test
results/mock/B/test/faz.B.tab
results/mock/B/test/foo.B.tab
results/mock/B/test/bar.B.tab
results/mock/A
results/mock/A/test
results/mock/A/test/bar.A.tab
results/mock/A/test/foo.A.tab
results/mock/A/test/faz.A.tab
results/mock/C
results/mock/C/test
results/mock/C/test/foo.C.tab
results/mock/C/test/faz.C.tab
results/mock/C/test/bar.C.tab
results/mock/unassigned
results/mock/unassigned/test
results/mock/unassigned/test/foo.D.tab
results/mock/unassigned/test/bar.D.tab
results/mock/unassigned/test/faz.D.tab

Note that the old DSL1 is no longer supported.

Steve
  • 51,466
  • 13
  • 89
  • 103