I have an R script in my workflow that requires the number of entries on several csv files (a.csv
, b.csv
, c.csv
; formatted with headers) as a value. Since they all have a string something
in every line, I thought I could write the rule as follows:
configfile: config.yaml
WILDCARD = config['wildcard']
TEMP_DIR = "~/temp"
rule all:
input:
f"{TEMP_DIR}/folder/{{WILDCARD}}/output.txt"
rule combine_geno_pheno_data_sibs:
input:
f"{TEMP_DIR}/file.txt",
f"{TEMP_DIR}/folder/{{WILDCARD}}/another_file.txt",
f"config['file']",
output:
f"{TEMP_DIR}/folder/{{WILDCARD}}/output.txt"
params:
n_lines = shell(
"grep -c something ../resources/{{WILDCARD}}.csv | xargs"
)
script:
"scripts/use_lines.R"
config.yaml
contains
wildcard:
- a
- b
- c
and n_lines
is called in R as snakemake@params$n_lines
.
The way the expansion is interpreted in shell()
, though, is as grep -c something ../resources/a b c.csv
, how do I get it to interpret the wildcards as e.g. grep -c something ../resources/a.csv
and return the value to n_lines
correctly?
Thanks in advance