I am working with nextflow to create a pipeline, and I am facing some problems in one of the processes.
I have a process that takes as input 2 normal files (output.kraken, and $sequences) and a string ("Aspergillus" for example)
I have another file 'fungal_species.txt) that contain multiples lines, and I want to iterate this file and launch the process on every line of them.
I tried that:
process fungal_reads_extraction {
publishDir("${params.extraction_output}" , mode: 'copy')
input:
path namesspecies
output:
path "*" , emit: reads_extracted_out
script:
"""
while read -r species_name; do
//Extract lines from the Kraken file where the third word matches the species name
awk -F'\t' -v "$species_name" 'BEGIN {OFS="\t"} \$3 ~ "$species_name" {print}' output.kraken > "${species_name}_lines.txt"
//Extract accessions from species lines
awk -F'\t' '{print \$2}' "${species_name}_lines.txt" > "${species_name}_accessions.txt"
//Add "@" symbol to the beginning of each line in the accession file
awk '{print "@" \$0}' "${species_name}_accessions.txt" > "${species_name}_full_accessions.txt"
//Extract reads assigned to the species
cat $sequences | awk 'NR==FNR {accessions[\$1]=1; next} \$1 in accessions {print; getline; print; getline; print; getline; print}' "${species_name}_full_accessions.txt" - > "${species_name}_reads.fastq"
//Cleanup intermediate files
rm "${species_name}_lines.txt" "${species_name}_accessions.txt" "${species_name}_full_accessions.txt"
done < fungal_species.txt
"""
}
It seemed to me very logic to use while, and mention the line as species_name. But when I try to run the pipeline, I met an error in that process saying that the species_name is uknown !!! It seems very bizarre, can anyone help me please, maybe I am ignoring something very important
ERROR ~ Error executing process > 'fungal_reads_extraction (1)'
Caused by:
No such variable: species_name -- Check script 'pipeline.nf' at line: 193
Thank you in advance ! have a good day !