1

I am working with nextflow to create a pipeline, and I am facing some problems in one of the processes.

I have a process that takes as input 2 normal files (output.kraken, and $sequences) and a string ("Aspergillus" for example)

I have another file 'fungal_species.txt) that contain multiples lines, and I want to iterate this file and launch the process on every line of them.

I tried that:

process fungal_reads_extraction {

     publishDir("${params.extraction_output}" , mode: 'copy') 
     
     input:
     path namesspecies
     
     output:
     path "*" , emit: reads_extracted_out
     
     script:
     """
   while read -r species_name; do

//Extract lines from the Kraken file where the third word matches the species name
     awk -F'\t' -v "$species_name" 'BEGIN {OFS="\t"} \$3 ~ "$species_name" {print}' output.kraken > "${species_name}_lines.txt"

//Extract accessions from species lines
     awk -F'\t' '{print \$2}' "${species_name}_lines.txt" > "${species_name}_accessions.txt"

//Add "@" symbol to the beginning of each line in the accession file
     awk '{print "@" \$0}' "${species_name}_accessions.txt" > "${species_name}_full_accessions.txt"

//Extract reads assigned to the species
     cat $sequences | awk 'NR==FNR {accessions[\$1]=1; next} \$1 in accessions {print; getline; print; getline; print; getline; print}' "${species_name}_full_accessions.txt" - > "${species_name}_reads.fastq"

//Cleanup intermediate files
     rm "${species_name}_lines.txt" "${species_name}_accessions.txt" "${species_name}_full_accessions.txt"

   done < fungal_species.txt


     """

}

It seemed to me very logic to use while, and mention the line as species_name. But when I try to run the pipeline, I met an error in that process saying that the species_name is uknown !!! It seems very bizarre, can anyone help me please, maybe I am ignoring something very important

ERROR ~ Error executing process > 'fungal_reads_extraction (1)'

Caused by:
  No such variable: species_name -- Check script 'pipeline.nf' at line: 193

Thank you in advance ! have a good day !

1 Answers1

2

$ in $species_name is not a nextflow variable but a SHELL variable. It must be escaped to tell nextflow that it's not a nextflow variable. awk -F'\t' -v "\$species_name" 'BEGIN {..

Futhermore, best way would be to split your fungal_species and parallelize per species. Something like:

species_ch = Channel.fromPath(params.path_to_fungal_species).splitText().map{it.trim()}


(...)
process fungal_reads_extraction {
     input:
     val(one_name)
     (...)
     
Pierre
  • 34,472
  • 31
  • 113
  • 192