0

I need to run two rules (gatk_Mutect2 and gatk_IndelRealigner) in the same snakefile.

If put these rules in different snakefiles, I can run them without error.

I use two input functions (get_files_somatic and get_files). Both use the case name as dictionary key. (Each case have a normal). When I put these rules in the same snakefile, snakemake tries to find the id of the normal on the input of gatk_IndelRealigner.

My question is: How can manage the ambiguity of two rules? I mean I want snakemake not try to connect these two rules.

def get_files_somatic(wildcards):
    case = wildcards.case
    control = aCondition[case][0]
    return ["{}.sorted.dup.reca.cleaned.bam".format(case),"{}.sorted.dup.reca.cleaned.bam".format(control)]

rule all:
    input: expand("{sample}.sorted.dup.reca.cleaned.bam",sample=create_tumor()),
           expand("Results/vcf/{case}.vcf",case=create_tumor()),

include_prefix="rules"

include:
    include_prefix + "/gatk2.rules"
include:
    include_prefix + "/mutec2.rules"


rule gatk_Mutect2:
    input: get_files_somatic,
    output: "Results/vcf/{case}.vcf",
    params:
    log: "logs/{case}.mutect2.log"
    threads: 8
    shell:

rule gatk_IndelRealigner:
    input:
        get_files,
    output:
       "{case}.sorted.dup.reca.cleaned.bam",
       "{case}.sorted.dup.reca.cleaned.bai",
    params:
    log:
        "mapped_reads/merged_samples/logs/{case}_indel_realign_2.log"
    threads: 8
    shell:

def get_files(wildcards):
    case = wildcards.case
    control = aCondition[case][0]
    wildcards.control = control
    return ["mapped_reads/merged_samples/{}.sorted.dup.reca.bam".format(case), "mapped_reads/merged_samples/{}.sorted.dup.reca.bam".format(control),"mapped_reads/merged_samples/operation/{}_{}.realign.intervals".format(case,control)]
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
mau_who
  • 315
  • 2
  • 13
  • Please try to explain your issue with more details and more clearly. – bli Nov 14 '17 at 14:01
  • @bli I have problems on snakemake pipeline because in this case I need to dive the pipeline in two different snakemake files: One for perform gat_Indellrealign ,the other for Mutect2. How can stop snakemake try to connect this two particular rules? – mau_who Nov 16 '17 at 15:43

1 Answers1

1

I'm not sure I really understood your problem. For instance, I don't get what you mean by "Each case have a normal".

But I can see that the output of gatk_IndelRealigner ("{case}.sorted.dup.reca.cleaned.bam") happens to be the same file name as one of the results of get_files_somatic ("{}.sorted.dup.reca.cleaned.bam".format(case), where case is wildcards.case).

That is the reason why gatk_Mutect2 gets "connected" to gatk_IndelRealigner.

It is the essence of snakemake to connect rules based on matching file names between their input and output.

If you do not want to have these two rules linked, you need to have different file names.

bli
  • 7,549
  • 7
  • 48
  • 94
  • thanks for your help. I'm not understand how not connect this two files. I f I try to gave another name (add a step to change a name) however try to connect. – mau_who Nov 22 '17 at 09:54
  • @snake3354898 I think it is impossible not to connect the rules if the file names match. That is how snakemake works. – bli Nov 22 '17 at 11:10
  • Thanks so much..the only way it is divede snakemake files in differents steps. .. – mau_who Nov 22 '17 at 11:14