My regular way of piping, partially based upon this Biostars post, is the following:
rule map:
input: "{sample}.fq.gz",
output: "sort/{sample}.bam"
threads: 24
shell:
"""
bwa mem reference.fa {input} \
-t {threads} | \
samtools sort - \
-@ {threads} \
-o {output}
"""
I was keen to try out Snakemake's pipes, as I hoped that they might make workflows with multiple pipes more readable.
rule map:
input: "{sample}.fq.gz",
output: pipe("{sample}.bam")
threads: 24
shell:
"""
bwa mem reference.fa {input} \
-t {threads} \
> {output}
"""
rule sort:
input: "{sample}.bam"
output: "sort/{sample}.bam"
threads: 24
shell:
"""
samtools sort {input} -@ {threads} -o {output}
"""
However, this results in the following WorkflowError: Job needs threads=48 but only threads=24 are available. This is likely because two jobs are connected via a pipe and have to run simultaneously. Consider providing more resources (e.g. via --cores).
So I have to divide the threads between bwa and samtools, but allocating threads to samtools means removing threads from bwa and I would prefer not to do this. This problem would become more pronounced in workflows with multiple piping steps.
I have not seen Snakemake pipes used so much, but I'm wondering if anyone knows a workaround? I am also considering to raise this as an issue at Snakemake's Github page.
Also, a general question about pipes. Is there a valid reason for Snakemake to allocate separate threads to processes in a pipe? Should I worry about both bwa and samtools using 24 threads in my regular way of piping?