6

I'm trying to first generate 4 files, for the LETTERS x NUMS combinations, then summarize over the NUMS to obtain one file per element in LETTERS:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{letter}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """

Executing this snakefile results in the following error:

WildcardError in line 19 of /tmp/Snakefile:
No values given for wildcard 'letter'.
  File "/tmp/Snakefile", line 19, in <module>

It seems that partial expand is not possible. Is it a limitation of expand ? If so, how should I circumvent it ?

bli
  • 7,549
  • 7
  • 48
  • 94

3 Answers3

8

Update (25/11/2020): As per this answer, partial expands are now possible without multi-bracketing, thanks to the allow_missing argument of expand.


It seems that this is not a limitation of expand, but a limitation of my familiarity with the way string-formatting works in python. I need to use double brackets for the non-expanded wildcard:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{{letter}}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """

Executing this snakefile now generates the expected following files:

text_A_2.txt
text_A_1.txt
text_B_2.txt
text_B_1.txt
combined_A.txt
combined_B.txt
bli
  • 7,549
  • 7
  • 48
  • 94
7

Partial expand is possible using allow_missing=True.

For example:

expand("text_{letter}_{num}.txt", num=[1, 2], allow_missing=True)

> ["text_{letter}_1.txt", "text_{letter}_2.txt"]
Scholar
  • 463
  • 5
  • 19
  • Thanks, I didn't know that. This may have been introduced in the last few years. – bli Nov 25 '20 at 10:27
  • Very helpful @Scholar! Is there a version of this that would work with different values of num for different letters? E.g., text_A_2.txt, text_A_1.txt, text_B_2.txt, text_B_3.txt. Where the combinations of `num` and `letter` are captured using `glob_wildcards`. – Ethan White May 26 '22 at 15:21
  • @EthanWhite Not sure if I understand correctly, but you want to render the expand-template with a set of pairs, e.g. {(A, 1), (A, 2), (B, 2), ..}? AFAIK this is not possible with expand. The function is very primitive - it simply creates the Cartesian product of all provided vectors and renders the template with each element of the resulting set. – Scholar Jun 07 '22 at 11:36
  • @EthanWhite You can open a question on that with a full example and I can take a look - a few lines of Python code to generate the pairs should do the trick. – Scholar Jun 07 '22 at 11:38
3

Indeed, braces need to be escaped when you want to ignore them in expand. It relies on str.format, and hence any rules from format apply to expand as well.

Johannes Köster
  • 1,809
  • 6
  • 8