I am trying to clean a data pipeline by using snakemake
. It looks like wildcards are what I need but I don't manage to make it work in params
My function needs a parameter that depends on the wildcard value. For instance, let's say
it depends on sample
that can either be A
or B
.
I tried the following (my example is more complicated but this is basically what I am trying to do) :
sample = ["A","B"]
import pandas as pd
def dummy_example(sample):
return pd.DataFrame({"values": [0,1], "sample": sample})
rule all:
input:
"mybucket/sample_{sample}.csv"
rule testing_wildcards:
output:
newfile="mybucket/sample_{sample}.csv"
params:
additional="{sample}"
run:
df = dummy_example(params.additional)
df.to_csv(output.newfile, index = False)
which gives me the following error:
Wildcards in input files cannot be determined from output files: 'sample'
I followed the doc and put expand
in output
section.
For the params
, it looked like this section and this thread was giving me everything needed
sample_list = ["A","B"]
import pandas as pd
import re
def dummy_example(sample):
return pd.DataFrame({"values": [0,1], "sample": sample})
def get_wildcard_from_output(output):
return re.search(r'sample_(.*?).csv', output).group(1)
rule all:
input:
expand("sample_{sample}.csv", sample = sample_list)
rule testing_wildcards:
output:
newfile=expand("sample_{sample}.csv", sample = sample_list)
params:
additional=lambda wildcards, output: get_wildcard_from_output(output)
run:
print(params.additional)
df = dummy_example(params.additional)
df.to_csv(output.newfile, index = False)
InputFunctionException in line 16 of /home/jovyan/work/Snakefile: Error: TypeError: expected string or bytes-like object Wildcards:
Is there some way to catch the value of the wildcard in params to apply the value in run
?