Escaping the double quotes is the problem, but to add a little more on formatting and pipes.
I prefer the syntax of wrapping each line in "
so that the lines can be spaced better:
rule bedtools:
input:
invcf="/path/to/my.vcf.gz",
bedgz="/path/to/my.bed.gz"
output:
outvcf="my.filtered.vcf.gz"
shell:
"/Tools/bedtools2/bin/bedtools "
"intersect "
"-a {input.invcf} "
"-b {input.bedgz} "
"-header -wa "
"| /Tools/bcftools/bcftools "
"annotate "
"-c CHROM,FROM,TO,GENE "
"-h <(echo '##INFO=<ID=GENE,Number=1,Type=String,Description=\"Gene name\">') "
"> {output.outvcf}"
I find that clearer to see each argument and easier to change by moving lines around. But note that the trailing space of each line is required and you have to use an explicit newline, \n
, if you want a separate command. When the prompt is printed the output is nicely formated. With the """
syntax you have to escape each newline with \
at the end and spaces at the start of the line are retained in printing.
If you have lot's of pipe work to do, check out the pipe flag. You write your first step as a rule and snakemake produces a named pipe between the rules, submitting them as a group:
rule bedtools_intersect:
input:
invcf="/path/to/my.vcf.gz",
bedgz="/path/to/my.bed.gz"
output:
outvcf=pipe("my.intersected.vcf.gz")
shell:
"/Tools/bedtools2/bin/bedtools "
"intersect "
"-a {input.invcf} "
"-b {input.bedgz} "
"-header -wa "
"> {output.outvcf}"
rule bcftools_annotate:
input:
invcf="my.intersected.vcf.gz"
output:
outvcf="my.filtered.vcf.gz"
shell:
"/Tools/bcftools/bcftools "
"annotate "
"-c CHROM,FROM,TO,GENE "
"-h <(echo '##INFO=<ID=GENE,Number=1,Type=String,Description=\"Gene name\">') "
"{input.invcf} "
"> {output.outvcf}"
The advantage is you can reuse each rule throughout your pipeline to intersect or annotate while avoiding temporary files.