How to do a partial expand in Snakemake?

Question

I'm trying to first generate 4 files, for the LETTERS x NUMS combinations, then summarize over the NUMS to obtain one file per element in LETTERS:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{letter}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """

Executing this snakefile results in the following error:

WildcardError in line 19 of /tmp/Snakefile:
No values given for wildcard 'letter'.
  File "/tmp/Snakefile", line 19, in <module>

It seems that partial expand is not possible. Is it a limitation of expand ? If so, how should I circumvent it ?

bli · Answer 1 · 2020-11-25T10:31:31.947

Update (25/11/2020): As per this answer, partial expands are now possible without multi-bracketing, thanks to the allow_missing argument of expand.

It seems that this is not a limitation of expand, but a limitation of my familiarity with the way string-formatting works in python. I need to use double brackets for the non-expanded wildcard:

LETTERS = ["A", "B"]
NUMS = ["1", "2"]


rule all:
    input:
        expand("combined_{letter}.txt", letter=LETTERS)

rule generate_text:
    output:
        "text_{letter}_{num}.txt"
    shell:
        """
        echo "test" > {output}
        """

rule combine text:
    input:
        expand("text_{{letter}}_{num}.txt", num=NUMS)
    output:
        "combined_{letter}.txt"
    shell:
        """
        cat {input} > {output}
        """

Executing this snakefile now generates the expected following files:

text_A_2.txt
text_A_1.txt
text_B_2.txt
text_B_1.txt
combined_A.txt
combined_B.txt

Scholar · Accepted Answer · 2021-01-22T14:06:18.353

7

Partial expand is possible using allow_missing=True.

For example:

expand("text_{letter}_{num}.txt", num=[1, 2], allow_missing=True)

> ["text_{letter}_1.txt", "text_{letter}_2.txt"]

edited Jan 22 '21 at 14:06

answered Nov 25 '20 at 10:10

Scholar

463
5
19

Thanks, I didn't know that. This may have been introduced in the last few years. – bli Nov 25 '20 at 10:27
Very helpful @Scholar! Is there a version of this that would work with different values of num for different letters? E.g., text_A_2.txt, text_A_1.txt, text_B_2.txt, text_B_3.txt. Where the combinations of `num` and `letter` are captured using `glob_wildcards`. – Ethan White May 26 '22 at 15:21
@EthanWhite Not sure if I understand correctly, but you want to render the expand-template with a set of pairs, e.g. {(A, 1), (A, 2), (B, 2), ..}? AFAIK this is not possible with expand. The function is very primitive - it simply creates the Cartesian product of all provided vectors and renders the template with each element of the resulting set. – Scholar Jun 07 '22 at 11:36
@EthanWhite You can open a question on that with a full example and I can take a look - a few lines of Python code to generate the pairs should do the trick. – Scholar Jun 07 '22 at 11:38

score 3 · Answer 3 · answered Nov 07 '16 at 09:51

3

Indeed, braces need to be escaped when you want to ignore them in expand. It relies on str.format, and hence any rules from format apply to expand as well.

answered Nov 07 '16 at 09:51

Johannes Köster

1,809
6
8

How to do a partial expand in Snakemake?

3 Answers3

Linked

Related