Say I'm following the best practise workflow suggested for snakemake. Now I'd like to know how (i.e. which version) a given file, say plots/myplot.pdf
, was generated. I found this surprisingly hard if not impossible only having the result folder at hand.
In more detail, say I was generated the results using. snakemake --use-conda --conda-prefix ~/.conda/myenvs
which will resolve and download the conda-environments specified in the rule below (copied from the documentation):
rule NAME:
input:
"table.txt"
output:
"plots/myplot.pdf"
conda:
"envs/ggplot.yaml"
script:
"scripts/plot-stuff.R"
Say the content of envs/ggplot.yaml
is the following:
channels:
- conda-forge
dependencies:
- r-ggplot2
After completion the ggplot environment will have been saved under say (note, the env name d2d1d57b assigned by snakemake automatically): ~/.conda/myevns/d2d1d57b
The problem is that if I ship the workflow
subfolder e.g. as the result to someone else (or as supplement to a paper), I don't know what ggplot
version was used for that run. All I know is the content of the yaml file (which is also reported when using --reports
.).
Also, since ggplot depends on other software, such as for instance R
, I wouldn't know which R version was used for a given rule using this environment, since yaml file doesn't list indirect dependencies.
Ideally, I'd like want to have the complete environment software version shipped with the workflow results.
As a workaround one could use conda env export name_of_env
and copy the output in the result folder, but strangly conda list -n ~/.conda/myevns/d2d1d57b
does not work ( due to error Characters not allowed: ('/', ' ', ':', '#')
)
Creating a environment manually and inspecting indeed gives me (among other info):
r-base 4.0.2 he766273_1 conda-forge
r-ggplot2 3.3.2 r40h6115d3f_0 conda-forge
That's exactly what I'm after, but this of course would be too tedious manually.
This is also true when using wrappers as far as I can tell.
In summary, given a workflow or even for a given file within the workflow, how to trace back which exact software version(s) were used to generate it. Ideally, this information would be automatically shipped with the result of a workflow by default.
Maybe I'm even missing something very obvious, so hopefully someone can shed some light on this.
Update: issue was submitted