4

Here is a short example from the advanced section of snakemake tutorial:

rule bwa_map:
input:
    "data/genome.fa",
    lambda wildcards: config["samples"][wildcards.sample]
output:
    "mapped_reads/{sample}.bam"
threads: 8
shell:
    "bwa mem -t {threads} {input} | samtools view -Sb - > {output}"

Now lets say that I wrote this rule months ago and I don't remember the output file name. My understanding is that I cannot run snakemake by invoking the rule name because this would lead to an error:

$ snakemake bwa_map
InputFunctionException in line 9 of Snakefile:
AttributeError: 'Wildcards' object has no attribute 'sample'
Wildcards:

$

First, I don't understand why snakemake cannot use the lambda function to deduce input files from the configuration file as it is quite clear that I refer to the "samples" section.

Second, is there a workaround to this? Because it is very easy to do with good old Makefile to just use an old Makefile and run the same bwa_map rule by typing something like

$ make bwa_map INPUT=data/samples/A.fastq

Thanks in advance for your help. Benoist

2 Answers2

1

If you specify a rule name as a target and that rule contains wildcards, Snakemake can't know what values to use for the wildcards. This can only be determined from a concrete output file in this case. This output file can come from a downstream rule, e.g. a real ´all´ target at the top of the Snakefile, or by providing it directly at the command line.

However if you have a proper target rule at the top of the Snakefile, there is the --until flag, which allows you to execute your workflow only until a given rule.

Regarding your make example, I am not aware of this functionality. Can you point me to the docs about this? I might add something similar to Snakemake as well.

Please also note that I just improved the error message for this case in the development version of Snakemake. It is now more informative and explains the issue.

Johannes Köster
  • 1,809
  • 6
  • 8
  • 1
    I admit Snakemake cannot know values to use for the wildcards because you say so but I don't understand why it has to be like that. The way I see it. As in the configuration file I define samples in the format `A: /path/to/foo.txt, B: /path/to/bar.txt`, it should be enough for a workflow program to say "ok, that's my input files, I should name the output file as ... let's go for it". Hence, the program has its inputs, outputs and instructions so it can produce the output files without the users having to know the output file names from the start –  Nov 16 '16 at 15:04
0

Thanks for your answer.

I cannot provide a link to an official documentation page. However, I'm talking about a very popular make feature so I think you know it even if it doesn't ring a bell right now.

Consider this Makefile named /path/to/workflows/variant_calling.make:

FASTQ = foo
GENOME = genome.fa
OUTPUT = some_complicated_output_file_name_$(FASTQ).txt

help:
    @echo 'This is a dummy example'
    @echo ''
    @echo 'Usage: make <command>'
    @echo ''
    @echo 'Available commands:'
    @echo '    help - display this help and exit'
    @echo '    mapping - map a fastq file to a reference genome'

mapping: $(OUTPUT)

$(OUTPUT):
    bwa mem $(GENOME) $(FASTQ) > $@

Obviously, one week after writing this Makefile, there is no way you still remember the output file name. But this is not important as you can create the output file by simply typing

$ make -f /path/to/workflows/variant_calling.make mapping FASTQ=bar.fastq

I could have numerous other rules in this Makefile, I would still be able to run only the mapping step using the above command.


I would like to be able to do exactly the same with snakemake, which would result in a command line that might look like this:

$ snakemake -s `path/to/myworkflow.snakefile` bwa_mem

Do I make myself clear?

Do you confirm this is not possible right now? If so, any chance of having this feature soon in Snakemake?

Thanks.

Benoist

  • Well, you can do exactly the same in Snakemake. Just define the same variables in plain Python (at the top of the Snakefile) or in a config file, and overwrite them at the command line with --config or using environment variables. This is not the canonical way to work in Snakemake (and it is also not in Make). I'd like to point you to the official [Snakemake tutorial](http://snakemake.bitbucket.org/snakemake-tutorial.html) for that. – Johannes Köster Dec 04 '16 at 10:09
  • Sorry I was not clear. Actually the example I give does not reflect my initial question. How can I say to Snakemake "apply this rule to whatever file name is given in this particular section of the configuration file", without naming the output files? –  Dec 05 '16 at 13:58
  • 1
    In general Snakemake rather works top down. You say what you want to get, and it finds a collection of rules to apply. What you want would still be possible using the builtin expand function and some python logic. However, it is not what Snakemake and Make where designed for. – Johannes Köster Dec 06 '16 at 14:35
  • Please also have a look at the tutorial. Your pattern is perfectly fine if you just don't specify files but rather e.g. sample or dataset names in that config file and use them to determine the targets. – Johannes Köster Dec 06 '16 at 14:37