4

My regular way of piping, partially based upon this Biostars post, is the following:

rule map:
    input: "{sample}.fq.gz",
    output: "sort/{sample}.bam"
    threads: 24
    shell:
        """
bwa mem reference.fa {input} \
-t {threads} | \
samtools sort - \
-@ {threads} \
-o {output}
        """

I was keen to try out Snakemake's pipes, as I hoped that they might make workflows with multiple pipes more readable.

rule map:
    input: "{sample}.fq.gz",
    output: pipe("{sample}.bam")
    threads: 24
    shell:
        """
    bwa mem reference.fa {input} \
    -t {threads} \
    > {output}
        """

rule sort:
    input: "{sample}.bam"
    output: "sort/{sample}.bam"
    threads: 24
    shell:
        """
samtools sort {input} -@ {threads} -o {output}
        """

However, this results in the following WorkflowError: Job needs threads=48 but only threads=24 are available. This is likely because two jobs are connected via a pipe and have to run simultaneously. Consider providing more resources (e.g. via --cores).

So I have to divide the threads between bwa and samtools, but allocating threads to samtools means removing threads from bwa and I would prefer not to do this. This problem would become more pronounced in workflows with multiple piping steps.

I have not seen Snakemake pipes used so much, but I'm wondering if anyone knows a workaround? I am also considering to raise this as an issue at Snakemake's Github page.


Also, a general question about pipes. Is there a valid reason for Snakemake to allocate separate threads to processes in a pipe? Should I worry about both bwa and samtools using 24 threads in my regular way of piping?

DriesB
  • 41
  • 3
  • "Is there a valid reason for Snakemake to allocate separate threads to processes in a pipe?": I'm not a UNIX expert, but I think this is necessary: the processes on both sides of a pipe need their own threads. – bli May 30 '20 at 09:27

0 Answers0