0

below is my snakemake codes, if i don't comment out line28,29 code,which is rule all->input->the 1st,2nd command lines,then I can't get the last rule varscan_somatic, that is to say,the dry run output likes this:

Job counts:
        count   jobs
        1       all
        25      mpileup_analysis
        25      normal_mpileup
        25      tumor_mpileup
        76
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

but if i do comment out line28,29 code,which is rule all->input->the 1st,2nd command lines,then I can get the last rule varscan_somatic, that is to say,the dry run output likes this:

Job counts:
        count   jobs
        1       all
        25      mpileup_analysis
        25      normal_mpileup
        25      tumor_mpileup
        25      varscan_somatic
        101
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

I don't know why it does happen? Can anybody give me some advise?thanks a lot for any help.

import re
import os

mydict = dict()
with open("config.txt") as HD:
    for line in HD:
        line = line.rstrip()
        if line.startswith("#"):
            continue
        value,field = re.split("\s*=\s*",line)
        mydict[value] = field

VarScan    = mydict['VarScan']
SAMtools   = mydict['SAMtools']
REFERENCE  = mydict['REFERENCE']
PL_MPPANA  = mydict['MPPANA']

chrlist = [i for i in os.listdir("call_region") if i.endswith(".region.bed")]
def replace(str1):
    str2 = str1.replace(".region.bed","")
    return str2
chrlist = map(replace,chrlist)

configfile:"paired_test.yaml"

rule all:
    input:
#        expand("varscan_somatic/{sample}/{chrid}.normal.mpileup.analysis",sample=config['samples'],chrid=chrlist),
#        expand("varscan_somatic/{sample}/{chrid}.tumor.mpileup.analysis",sample=config['samples'],chrid=chrlist),
        expand("varscan_somatic/{sample}/{chrid}.snp.vcf",sample=config['samples'],chrid=chrlist),
        expand("varscan_somatic/{sample}/{chrid}.indel.vcf",sample=config['samples'],chrid=chrlist)

rule normal_mpileup:
    input:
        bam=lambda wc:config['samples'][wc.sample][3],
        bed="call_region/{chrid}.region.bed"
    output:
        "varscan_somatic/{sample}/{chrid}.normal.mpileup"
    log:
        "log/varscan_somatic/{sample}/{chrid}.normal.mpileup.log"
    shell:
        "{SAMtools} mpileup -f {REFERENCE} -l {input.bed} "
        "{input.bam} -d1000 -Q10 -q10 -o {output} "
        "1>{log} 2>&1"

rule tumor_mpileup:
    input:
        bam=lambda wc:config['samples'][wc.sample][1],
        bed="call_region/{chrid}.region.bed"
    output:
        "varscan_somatic/{sample}/{chrid}.tumor.mpileup"
    log:
        "log/varscan_somatic/{sample}/{chrid}.tumor.mpileup.log"
    shell:
        "{SAMtools} mpileup -f {REFERENCE} -l {input.bed} "
        "{input.bam} -d1000 -Q10 -q10 -o {output} "
        "1>{log} 2>&1"

rule mpileup_analysis:
    input:
        tumor="varscan_somatic/{sample}/{chrid}.tumor.mpileup",
        norml="varscan_somatic/{sample}/{chrid}.normal.mpileup"
    output:
        tumor="varscan_somatic/{sample}/{chrid}.tumor.mpileup.analysis",
        norml="varscan_somatic/{sample}/{chrid}.normal.mpileup.analysis"
    log:
        "log/varscan_somatic/{sample}/{chrid}.mpileup_analysis.log"
    shell:
        "{PL_MPPANA} {input.tumor} {output.tumor} 1>{log} 2>&1 "
        "&& {PL_MPPANA} {input.norml} {output.norml} 1>>{log} 2>&1"

rule varscan_somatic:
    input:
        tumor="varscan_somatic/{sample}/{chrid}.tumor.mpileup",
        norml="varscan_somatic/{sample}/{chrid}.normal.mpileup",
        temp1="varscan_somatic/{sample}/{chrid}.tumor.mpileup.analysis",
        temp2="varscan_somatic/{sample}/{chrid}.normal.mpileup.analysis"
    output:
        "varscan_somatic/{sample}/{chrid}.snp",
        "varscan_somatic/{sample}/{chrid}.indel"
    log:
        "log/varscan_somatic/{sample}/{chrid}.varscan_somatic.log"
    params:
        "varscan_somatic/{sample}/{chrid}",
        "--validation 1 --output-vcf 1"
    shell:
        "{VarScan} somatic {input.tumor} {input.norml} {params} 1>{log} 2>&1"
>>>config.txt
VarScan = /path/my/varscan
REFERENCE = /path/my/hg19.fa
SAMtools = /path/my/samtools
MPPANA = /path/my/mppana.pl
>>> paired.yaml
samples:
    S01:['S01','S01.bqsr.bam','S02','S02.bqsr.bam']

below is: snakemake summary without comments

snakemake -s bin/varscan_somatic_paired.py -np --forceall --summary
Building DAG of jobs...
output_file     date    rule    version log-file(s)     status  plan
varscan_somatic/S001/chr21.tumor.mpileup.analysis       Thu Sep 26 02:34:56 2019        mpileup_analysis        -       log/varscan_somatic/S001/chr21.mpileup_analysis.log ok      update pending
varscan_somatic/S001/chr21.normal.mpileup.analysis      Thu Sep 26 02:34:56 2019        mpileup_analysis        -       log/varscan_somatic/S001/chr21.mpileup_analysis.log ok      update pending
varscan_somatic/S001/chr10.tumor.mpileup.analysis       Wed Sep 25 22:56:59 2019        mpileup_analysis        -       log/varscan_somatic/S001/chr10.mpileup_analysis.log ok      update pending

below is: snakemake summary with comments

Building DAG of jobs...
output_file     date    rule    version log-file(s)     status  plan
varscan_somatic/S001/chr19.snp.vcf      Wed Sep 25 22:14:13 2019        varscan_somatic -       log/varscan_somatic/S001/chr19.varscan_somatic.log ok       update pending
varscan_somatic/S001/chr19.indel.vcf    Wed Sep 25 22:14:13 2019        varscan_somatic -       log/varscan_somatic/S001/chr19.varscan_somatic.log ok       update pending
varscan_somatic/S001/chr14.snp.vcf      Thu Sep 26 01:22:17 2019        varscan_somatic -       log/varscan_somatic/S001/chr14.varscan_somatic.log ok       update pending

enter image description here

enter image description here

Danielle
  • 19
  • 5
  • I cannot see any reason why the execution of varscan_somatic should depend on those lines commented out. Try to add the `--summary` option to your snakemake command with and without commented lines and post the output if not too big. – dariober Sep 23 '19 at 12:57
  • Did you try `--forceall`? – Dmitry Kuzminov Sep 23 '19 at 12:57
  • @DmitryKuzminov I tried --forceall,but nothing to change.`$ snakemake -s bin/varscan_somatic_paired.py -np --forceall Building DAG of jobs... Job counts: count jobs 1 all 25 mpileup_analysis 25 normal_mpileup 25 tumor_mpileup 76` – Danielle Sep 26 '19 at 08:32
  • @dariober I tried --summary and the result is very large.so I will give head 3 lines in the question. I don't know whether i can give the files to you in Stack Overflow? – Danielle Sep 26 '19 at 08:41
  • @Danielle You can provide a picture of dependences here. Try snakemake --dag. – Dmitry Kuzminov Sep 26 '19 at 13:06
  • @DmitryKuzminov I have loaded the pictures,but I see them failed loading. I think maybe due to the firewall,I cannot load pictures to stackoverflow. – Danielle Sep 29 '19 at 03:31
  • I see the DAG picture is accordant with the Dry-run result or summay. I think this is a bug of snakemake. – Danielle Sep 29 '19 at 07:39
  • @Danielle I'm not sure that the code you provided is the actual code you run. At least the identation of the comments is invalid. Try to simplify the example. First leave the only sample/chrid pair in config. Then substitute the `shell`section with just simple `touch {output}`. Provide a minimal example that doesn't work. I'll try this on my machine and confirm if that doesn't work. PS I don't believe that this is a bug in Snakemake. – Dmitry Kuzminov Sep 29 '19 at 10:09

1 Answers1

0

This code is wrong: before fix:

def replace(str1):
    str2 = str1.replace(".region.bed","")
    return str2
chrlist = map(replace,chrlist)

after fix:

def replace(str1):
    str2 = str1.replace(".region.bed","")
    return str2
chrlist = list(map(replace,chrlist))

then everything is ok.

Danielle
  • 19
  • 5