0

My workflow is approximately as follows:

import pandas as pd
from snakemake.utils import Paramspace

TEMP_DIR = "temp"

chr = 22

# Set up paramspace for QC values:
paramspace = Paramspace(pd.read_csv("config/qc_values.tsv",
                                    sep = "\t"))

### Target rule ###
rule all:
    input:
      expand(f"{TEMP_DIR}/snp-stats/post-qc/{{parameters}}/by_chr/data.chr{{chr}}.snp-stats",
            parameters = paramspace.instance_patterns, chr = "22"),

### Modules ###
include: rules/rules.smk

rules.smk:

[...]
rule filter_snp_stats:
    # Filter SNPs so to include only variants with high info score
    # and in HWE in biallelic loci. The output is chr by chr
    input:
        f"{TEMP_DIR}/snp-stats/pre-qc/by_chr/data.chr{{chr}}.snp-stats"
    output:
        f"{TEMP_DIR}/snp-stats/post-qc/{{paramspace.wildcard_pattern}}/by_chr/data.chr{{chr}}.snp-stats",
    params:
        thresholds = paramspace.instance
    script:
        "../scripts/filter_snpstats.R"

config/qc_values.tsv:

hwe_p   info
1e-06   0.8
1e-06   0.9
1e-06   0.95
1e-06   0.99

But if I run it, I get the following error

$ snakemake -np

Building DAG of jobs...
MissingInputException in rule all in file /mnt/storage/project/workflow/Snakefile, line 53:
Missing input files for rule all:
    affected files:
        /mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.99/by_chr/data.chr22.snp-stats
        /mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.95/by_chr/data.chr22.snp-stats
        /mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.8/by_chr/data.chr22.snp-stats
        /mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.9/by_chr/data.chr22.snp-stats

...which is correct, as they should be produced by filter_snp_stats, but it isn't clear why it can't keep track of them through the wildcard. What am I doing wrong?

Giulio Centorame
  • 678
  • 4
  • 19

1 Answers1

2

Paramspace requires an expansion through f-strings, not as a wildcard, so {paramspace..wildcard_pattern} should not be escaped from the f-string expansion with {{}}. In the documentation, this is seen here. The correct rule would be:

rule filter_snp_stats:
    input:
        f"{TEMP_DIR}/snp-stats/pre-qc/by_chr/data.chr{{chr}}.snp-stats"
    output:
        f"{TEMP_DIR}/snp-stats/post-qc/{paramspace.wildcard_pattern}/by_chr/data.chr{{chr}}.snp-stats",
    params:
        thresholds = paramspace.instance
    script:
        "../scripts/filter_snpstats.R"
Giulio Centorame
  • 678
  • 4
  • 19
  • 1
    Happy to add more technical info/rephrase it if it isn't clear, just thought I'd post about it since the exception message is extremely unhelpful – Giulio Centorame Apr 25 '23 at 11:39