My workflow is approximately as follows:
import pandas as pd
from snakemake.utils import Paramspace
TEMP_DIR = "temp"
chr = 22
# Set up paramspace for QC values:
paramspace = Paramspace(pd.read_csv("config/qc_values.tsv",
sep = "\t"))
### Target rule ###
rule all:
input:
expand(f"{TEMP_DIR}/snp-stats/post-qc/{{parameters}}/by_chr/data.chr{{chr}}.snp-stats",
parameters = paramspace.instance_patterns, chr = "22"),
### Modules ###
include: rules/rules.smk
rules.smk:
[...]
rule filter_snp_stats:
# Filter SNPs so to include only variants with high info score
# and in HWE in biallelic loci. The output is chr by chr
input:
f"{TEMP_DIR}/snp-stats/pre-qc/by_chr/data.chr{{chr}}.snp-stats"
output:
f"{TEMP_DIR}/snp-stats/post-qc/{{paramspace.wildcard_pattern}}/by_chr/data.chr{{chr}}.snp-stats",
params:
thresholds = paramspace.instance
script:
"../scripts/filter_snpstats.R"
config/qc_values.tsv:
hwe_p info
1e-06 0.8
1e-06 0.9
1e-06 0.95
1e-06 0.99
But if I run it, I get the following error
$ snakemake -np
Building DAG of jobs...
MissingInputException in rule all in file /mnt/storage/project/workflow/Snakefile, line 53:
Missing input files for rule all:
affected files:
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.99/by_chr/data.chr22.snp-stats
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.95/by_chr/data.chr22.snp-stats
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.8/by_chr/data.chr22.snp-stats
/mnt/storage/project/project_folder/snp-stats/post-qc/hwe_p~1e-06/info~0.9/by_chr/data.chr22.snp-stats
...which is correct, as they should be produced by filter_snp_stats, but it isn't clear why it can't keep track of them through the wildcard. What am I doing wrong?