4

What would be best environment.yml practices for specifying packages in Snakemake wrappers using conda? I understand that the channels should be:

channels:    
  - conda-forge
  - bioconda
  - base

However, what is a good choice for specifying packages? Do I specify no version? Full versions?

Using full versions has led to using infinite/super long conda environment resoultion problems before. However, not pinning versions gives the risk of implicitely upgrading to an incompatible version of a package.

Do I specify only direct dependencies or should I put the output of conda env export there so everything is frozen?

Manuel
  • 6,461
  • 7
  • 40
  • 54
  • 1
    My recommendation is to specify specific versions in the yaml, and switch to use the mamba frontend! https://snakemake.readthedocs.io/en/stable/executing/cli.html, `--conda-frontend mamba` – Maarten-vd-Sande Oct 29 '20 at 15:42

1 Answers1

7

For package version numbers, I would usually opt for pinning the major and minor version. This way, users will get the newest security patches and bug fixes whenever they create an environment, while nothing should change in a backward incompatible way (wherever developers properly follow semantic versioning).

Also, I would only specify direct dependencies and let the environment solver handle any implicit dependencies. This provides a certain level of freedom to meet different needs for different packages, while usually the packages' recipes should specify any restrictions to particular versions.

Another way to avoid (future) conflicts and keep environment creation quick, is to keep environments as small and granular as possible (see Johannes' comment below). If different rules share only some dependencies but not others, I would rather create separate minimal environments for each rule than reuse a bigger environment. Snakemake wrappers will do this anyways, as each wrapper has its own environment definition.

As Johannes pointed out, the same applies to channels: Only specify channels that you are actually using and it is not necessary to specify the base channel any more. And when using mamba, you can specify bioconda as the first channel.

Talking of mamba: If speed matters, I would currently use mamba to do the environment solving -- it is usually much faster than conda and is better at ensuring that you get the most up to date version of packages. In snakemake, you can use it via --conda-frontend mamba as also pointed out in Maarten's comment to the question.

But, of course everything always depends. If you have known incompatibilities of versions that are not handled by the packages' recipes, specifying and pinning implicit dependencies can be necessary. If you have software that creates output which can change with a patch version, then you of course have to pin the patch version.

dlaehnemann
  • 671
  • 5
  • 17
  • 2
    To add to that, only specify the channels that are needed. base does not need to be specified nowadays. bioconda only if you have packages from there. And if that's the case, when using mamba you can nowadays put bioconda at the top. And finally, try to keep environments as fine grained (by rule/step) as possible, in order to increase transparency, speed, and maintainability. – Johannes Köster Oct 30 '20 at 09:04
  • `base` and `defaults` are probably synonmys, right? So `defaults` also doesn't need to be specified any more? – dlaehnemann Oct 30 '20 at 09:46
  • Thank you for your answer. There are three way to export conda env that we can later use with Snakemake: `conda env export --from-history > myenv.yml`, `conda env export --no-builds > myenv.yml`, and the last method which export packages version and build `conda env export > myenv.yml`. Which one do you recommend? I think based on you answer `conda env export --from-history > myenv.yml` but this could lead to some error e.g, Encountered problems while solving. Problem: nothing provides libzlib >=1.2.11,<1.3.0a0 needed by samtools-1.15.1-h1170115_0 – Medhat Jul 27 '22 at 22:49
  • 1
    I actually always manually write down these environment definitions. Otherwise you pin to particular patch versions (of the actual software) and the software might get patches under the same minor version. And you pin to particular (bio-)conda build versions, while the build might be fixed for some underlying issue in a later build version (but with the same software version). Also, this usually pins ALL dependencies in an environment, while you usually only want to pin direct dependencies. Otherwise environments break when package versions disappear (this happens). – dlaehnemann Jul 29 '22 at 08:13