Snakemake: access a list within a dict by using a wildcard

Question

To break it down, I have a dict that looks like this:

dict = {'A': ["sample1","sample2","sample3"], 
        'B': ["sample1","sample2"], 
        'C': ["sample1","sample2","sample3"]}

And I have a rule:

rule example:
      input:
          #some input
      params:
          # some params
      output:
          expand('{{x}}{sample}', sample=dict[wildcards.x])
          # the alternative I tried was
          # expand('{{x}}{sample}', sample=lambda wildcards: dict[wildcards.x])
      log:
          log = '{x}.log'
      run:
        """
        foo
        """

My problem is how can I access the dictonary with the wildcard.x as key such that I get the list of items corresponding to the wildcard as key. The first example just gives me

name 'wildcards' is not defined

while the alternative just gives me

Missing input files for rule all Since snakemake doesn't even runs the example rule.

I need to use expand, since I want the rule to run only once for each x wildcard while creating multiple samples in this one run.

Dmitry Kuzminov · Answer 1 · 2020-12-23T11:07:22.750

1

You can use a lambda as a function of a wildcard in the input section only, and cannot use in the output. Actually output doesn't have any wildcards, it defines them.

Let's rethink of your task from another view. How do you decide how many samples the output produces? You are defining the dict: where does this information come from? You have not shown the actual script, but how does it know how many outputs to produce?

Logically you might have three separate rules (or at least two), one knows how to produce two samples, the other how to produce three ones.

As I can see, you are experiencing a Problem XY: you are asking the same question twice, but you are not expressing your actual problem (X), while forcing an incorrect implementation with defining all outputs as a dictionary (Y).

Update: another possible solution to your artificial example would be to use dynamic output:

rule example:
      input:
          #some input
      output:
          dynamic('{x}sample{n}')

That would work in your case because the files match the common pattern "sample{n}".

edited Dec 23 '20 at 11:07

answered Dec 23 '20 at 04:28

Dmitry Kuzminov

6,180
6
18
40

"Logically you might have three separate rules (or at least two), one knows how to produce two samples, the other how to produce three ones." I'm pretty sure this can be done in 1 rule since this rule should just be able to run multiple times for the wildcards and once to create all the samples using the expand function for that wildcard. I've seen it many times now, but the main problem for me has been the dictonary. But the important part: The dict is created right before the rules are executed. I sadly can't show the actual script, but I don't really think it matters in this context. – Noob Dec 23 '20 at 10:46
The problem is not in "running the rule" but is on the phase of constructing the DAG. Snakemake takes the pattern that you provide by the rule's output definition, and using that pattern it tries to satisfy the inputs of other rules in the DAG. The pattern has to be generic to work with any possible wildcard substitution. One of the consequences is that the number of files in the output cannot depend on the wildcards values. – Dmitry Kuzminov Dec 23 '20 at 10:53
@Noob, one possible solution is to use `dynamic` specifier in the input. But the files need to match a pattern, so the solution is not as generic as you wish with the dictionary. Anyway, I'll add the update to my solution. – Dmitry Kuzminov Dec 23 '20 at 10:57
Yeah, but having the dict created before running and then setting it correctly in the rule all, should take care of that. I've printed out what my rule all asks for and checked in many threads, I'm pretty certain the problem doesn't lie within another rule. Let's just take this as an example: https://stackoverflow.com/questions/47872708/snakemake-one-wildcard-and-also-expand my problem should be pretty much the same with one problematic aspect, that is accessing the dictonary with my wildcard that I get the list stored within. The accessing part should be pretty much like A_LIST in the example – Noob Dec 23 '20 at 11:00
With the difference being that A_LIST differs in size but that shouldn't be a problem since it's not a wildcard and is defined in the expand – Noob Dec 23 '20 at 11:01
I don't think it's a good idea to use a deprecated feature like dynamic, but I guess I will go with the checkpoints feature and test if it works, thank you for the idea, this might work – Noob Dec 23 '20 at 11:05
@Noob, at the link you've provided the list doesn't depend on the wildcard (as you wish). The list is the same for every rule instantiation: you may manually replace the `expand` with the list of files. You cannot make this list of various size depending on the wildcard. – Dmitry Kuzminov Dec 23 '20 at 11:06
ah guess there is where my error lies, I thought the list size doesn't matter and can change depending on the wildcard if it has been set in the rule all beforehand. https://stackoverflow.com/questions/58848521/how-to-use-dict-value-and-key-in-snakemake-rules made me think it could be possible with an extend instead of running the rule every time in the example found there – Noob Dec 23 '20 at 11:10
sadly dynamic doesn't work and therefore checkpoint doesn't work as well. It just keeps executing like if I've left expand away by running rules multiple times. – Noob Dec 23 '20 at 11:16

Snakemake: access a list within a dict by using a wildcard

1 Answers1