Unexperienced, self-tought "coder" here, so please be understanding :]
I am trying to learn and use Snakemake to construct pipeline for my analysis. Unfortunatly, I am unable to run multiple instances of a single job/rule at the same time. My workstation is not a computing cluster, so I cannot use this option. I looked for an answer for hours, but either there is non, or I am not knowledgable enough to understand it. So: is there a way to run multiple instances of a single job/rule simultaneously?
If You would like a concrete example:
Lets say I want to analyze a set of 4 .fastq files using fastqc tool. So I input a command:
time snakemake -j 32
and thus run my code, which is:
SAMPLES, = glob_wildcards("{x}.fastq.gz")
rule Raw_Fastqc:
input:
expand("{x}.fastq.gz", x=SAMPLES)
output:
expand("./{x}_fastqc.zip", x=SAMPLES),
expand("./{x}_fastqc.html", x=SAMPLES)
shell:
"fastqc {input}"
I would expect snakemake to run as many instances of fastqc as possible on 32 threads (so easily all of my 4 input files at once). In reality. this command takes about 12 minutes to finish. Meanwhile, utilizing GNU parallel from inside snakemake
shell:
"parallel fastqc ::: {input}"
I get results in 3 minutes. Clearly there is some untapped potential here.
Thanks!