Here's a param function that let's you expand values from several different snakemake sources in a config string:
def paramFunc(wildcards, input, output, threads, resources, config,
global_cfg, this_cfg, S):
return S.format(wildcards=wildcards, input=input, output=output,
threads=threads, resources=resources, config=config,
global_cfg=global_cfg, this_cfg=this_cfg)
Here's an example of how to call paramFunc() from within a Snakemake params: section, to expand the value of the config parameter config["XYZ"] and assign it to the parameter named "text", then expand that "text" parameter in a shell command:
params:
text=lambda wildcards, input, output, threads, resources:
paramFunc(wildcards, input, output, threads, resources, config,
global_cfg, my_local_cfg, config["XYZ"])
shell: "echo 'text is {params.text}'"
Notice that the last argument to paramFunc() is the parameter value you want
to expand, config["XYZ"] in this case. The other arguments are all dictionaries containing values that might be referenced by that parameter value.
You might have defined config["XYZ"] like this, for example, in a .yaml file:
ABC: "Hello world"
XYZ: "ABC is {config[ABC]}"
However, the string XYZ is not limited to expanding values defined in the same file (ABC is expanded here), but you can use other "{}" constructs to access other values defined elsewhere:
Defined in Use this construct in param
---------- ---------------------------
"config" dictionary "{config[<name>]}"
wildcards used in the output filename "{wildcards[<name>]}"
input filename(s) "{input}" or "{input[NAME]}" or "{input[#]}"
output filename(s) "{output}" or "{output[NAME]}" or "{output[#]}"
threads "{threads}"
resources "{resources[<name>]}"
"global_cfg" global config dictionary "{global_cfg[<name>]}"
"my_local_cfg" module config dictionary "{this_cfg[<name>]}"
The values "global_cfg" and "my_local_cfg" are two special dictionaries that could be added to assist with modularizing the snakefile.
For "global_cfg", the idea is that you might want to have a dictionary of snakefile-global definitions. In your main snakefile, do this:
include: "global_cfg.py"
And in file global_cfg.py, place global definitions:
global_cfg = {
"DATA_DIR" : "ProjData",
"PROJ_DESC" : "Mint Sequencing"
}
Then you can reference these values in parameter strings with e.g.:
"{global_cfg[DATADIR]}"
(the strings must be expanded in a params: section by calling paramFunc())
For "my_local_cfg", the idea is that you might want to place each snakefile rule in a separate file, and have the parameters for that rule also defined in a separate file, so each rule has a rule file and a parameter file. In the main snakefile:
(include paramFunc() definition above)
include: "myrule.snake"
rule all:
input: "myrule.txt"
In myrule.snake:
include: "myrule.py"
In myrule.py place the config settings for the myrule module:
myrule_cfg = {
"SPD" : 125,
"DIST" : 98,
"MSG" : "Param settings: Speed={this_cfg[SPD]} Dist={this_cfg[DIST]}"
}
and back in myrule.snake:
include: "myrule.py"
rule myrule:
params:
SPD=myrule_cfg["SPD"],
DIST=myrule_cfg["DIST"],
# For MSG call paramFunc() to expand {name} constructs.
MSG=lambda wildcards, input, output, threads, resources:
paramFunc(wildcards, input, output, threads, resources, config,
global_cfg, myrule_cfg, myrule_cfg["MSG"])
message: "{params.MSG}"
output: "myrule.txt"
shell: "echo '-speed {params.SPD} -dist {params.DIST}' >{output}"
Note that the paramFunc() function maps the name "myrule_cfg" (varies from one rule to the next) to the fixed name "this_cfg" (same regardless of rule).
Note that I include .py files that define the global_cfg and this_cfg dictionaries. These could instead be defined in .yaml files, but the problem is that they then all end up in one dictionary, "config". It would be nice if the configfile command allowed the dictionary to be specified, e.g.:
configfile: global_cfg="global_cfg.yaml"
Perhaps that feature will be added someday to snakemake.