Best way to run same rule twice with different params

Question

I'm using bcftools consensus to extract haplotypes from a vcf file. Given the input files:

A.sorted.bam
B.sorted.bam

The following output files are created:

A.hap1.fna
A.hap2.fna
B.hap1.fna
B.hap2.fna

I currently have two rules to do this. They differ only by the numbers 1 and 2 in the output files and shell command. Code:

rule consensus1:
    input:
        vcf="variants/phased.vcf.gz",
        tbi="variants/phased.vcf.gz.tbi",
        bam="alignments/{sample}.sorted.bam"
    output:
        "haplotypes/{sample}.hap1.fna"
    params:
        sample="{sample}"
    shell:
        "bcftools consensus -i -s {params.sample} -H 1 -f {reference_file} {input.vcf} > {output}"

rule consensus2:
    input:
        vcf="variants/phased.vcf.gz",
        tbi="variants/phased.vcf.gz.tbi",
        bam="alignments/{sample}.sorted.bam"
    output:
        "haplotypes/{sample}.hap2.fna"
    params:
        sample="{sample}"
    shell:
        "bcftools consensus -i -s {params.sample} -H 2 -f {reference_file} {input.vcf} > {output}"

While this code works, it seems that there should be a better, more pythonic way to do this using only one rule. Is it possible to collapse this into one rule, or is my current method the best way?

Manavalan Gajapathy · Accepted Answer · 2018-06-05T23:08:53.543

2

Use wildcards for haplotypes 1 and 2 in rule all. See here to learn more about adding targets via rule all

reference_file = "ref.txt"

rule all:
    input:
        expand("haplotypes/{sample}.hap{hap_no}.fna",
                   sample=["A", "B"], hap_no=["1", "2"])

rule consensus1:
    input:
        vcf="variants/phased.vcf.gz",
        tbi="variants/phased.vcf.gz.tbi",
        bam="alignments/{sample}.sorted.bam"
    output:
        "haplotypes/{sample}.hap{hap_no}.fna"
    params:
        sample="{sample}",
        hap_no="{hap_no}"
    shell:
        "bcftools consensus -i -s {params.sample} -H {params.hap_no} \
               -f {reference_file} {input.vcf} > {output}"

edited Jun 05 '18 at 23:08

answered Apr 02 '18 at 03:19

Manavalan Gajapathy

3,900
2
20
43

Excellent, this is exactly what I needed. Using wildcards with the target rule is very powerful. Thank you for the help! – Kelly Sovacool Apr 02 '18 at 17:26
You don't even need to reassign `hap_no` to a `param`. Wildcards are directly accessible from the rule body, in this instance, as `wildcards.hap_no`. – Unknown artist May 17 '18 at 17:39
@KirillG True. But it's a habit of mine to keep all parameters I use in one place for easy readability. – Manavalan Gajapathy May 17 '18 at 18:04

Best way to run same rule twice with different params

1 Answers1