1

I have a following nextflow script which runs a tool perf on all the split fastq files located in the below mentioned directory. When I run the script I get the following error:

*Error executing process > 'perf (29)'
 Caused by:
 Process `perf (29)` terminated with an error exit status (1)
 Command executed:
 /proj/perf/bin/PERF -i Condition_R1paired.part_016.fastq -o Condition_R1paired.part_016.tsv --format fastq -m 1 -M 6 -u repeat_units.txt
 Command exit status:
  1
 Command output:
 (empty)
 Command error:
 Units file specified is not found. Please provide a valid file
 Work dir:
 /proj/perf/work/57/c9208b00b8c5c82c3f1fdf6c7d0f07
*Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script* file named `.command.sh`**

This is the script

params.fastq_dir = "/proj/split_fastq/*.fastq"

params.outdir = '/proj/work/output'

fastq_file_index=Channel.fromPath(params.fastq_dir, checkIfExists: true ).map { it -> [it.baseName, it] }

process perf {

publishDir("${params.outdir}", mode: 'copy')

input:
tuple val(file_name), path(fastq_file)

output:
path "${file_name}.tsv"


script:
"""
/proj/perf/bin/PERF -i ${fastq_file} -o ${file_name}.tsv --format fastq -m 1 -M 6 -u repeat_units.txt

"""

 }


workflow {

    perf(fastq_file_index).view()

 }

Any tip on suspected error and potential fix is welcome, I am guessing the repeat_units.txt cannot be shared concurrently with all the nextflow processes? In that case how do I share a file with multiple processes? Also any pointers on improving the code is welcome as I am new to nextflow. Thanks

AishwaryaKulkarni
  • 774
  • 1
  • 8
  • 19

1 Answers1

1

This is not a Nextflow error per se. PERF is just asking you to provide a units file:

Units file specified is not found. Please provide a valid file

One way would be to use an extra parameter to specify the file. The file could then be passed to each of the processes using a value channel. Note that a value channel is implicitly created by a process when it is invoked with a simple value1. You could try the following:

params.fastq_files = "/proj/split_fastq/*.fastq"
params.repeat_units = '/path/to/repeat_units.txt'

params.outdir = './results'
process perf {

    publishDir "${params.outdir}/perf", mode: 'copy'

    input:
    tuple val(sample_name), path(fastq_file)
    path repeat_units

    output:
    tuple val(sample_name), path("${sample_name}.tsv")

    """
    PERF \\
        -i "${fastq_file}" \\
        -o "${sample_name}.tsv" \\
        --format fastq \\
        -m 1 \\
        -M 6 \\
        -u "${repeat_units}"
    """

}
workflow {

    Channel
        .fromPath( params.fastq_files, checkIfExists: true )
        .map { tuple( it.baseName, it ) }
        .set { fastq_files }

    repeat_units = file( params.repeat_units )

    perf( fastq_files, repeat_units )
 }
Steve
  • 51,466
  • 13
  • 89
  • 103
  • Thank you, this worked perfectly, what would '.set{ fastq_files }' do in the workflow ? – AishwaryaKulkarni Mar 06 '23 at 23:24
  • 1
    @AishwaryaKulkarni Great! The [`set`](https://www.nextflow.io/docs/latest/operator.html#set) operator just assigns a channel name. You could also use `fastq_files = Channel.fromPath(...).map(...)`, but using `set` usually makes reading a chain of commands a bit easier. – Steve Mar 06 '23 at 23:46