2

I'm trying to install conda environment using the command:

conda env create -f devenv.yaml

My .yaml file is

name: myname
channels:
  - conda-forge
  - bioconda
dependencies:
  # Package creation and environment management
  - conda-build
  # Automation control (command line interface, workflow and multi-process management)
  - python-dotenv
  - click
  - snakemake-minimal
  - joblib
  - numba
  # Workspace
  - notebook
  # Visualization
  - plotly
  - plotly-orca
  - matplotlib
  - seaborn
  - shap
  - openpyxl
  - ipywidgets
  - tensorboard
  # Data manipulation
  - numpy
  - pandas
  - pyarrow
  # Functional style tools
  - more-itertools
  - toolz
  # Machine learning
  - scikit-learn
  - imbalanced-learn
  - scikit-image
  - statsmodels
  - catboost
  - hyperopt
  - tsfresh
  # Deep learning
  - pytorch
  # code checking and formatting
  - pylint
  - black
  - flake8
  - mypy
  # Python base
  - python
  - pip
  - pip:

I've tried to update conda but it doesn't help. It just stuck on solving the environment.

conda version: 4.11.0 c OS: Ubuntu 18.04.5 LTS

The exact same environment works fine on my mac, but not on that server. What could be the issue? I appreciate any suggestions. Thx.

Ildar
  • 33
  • 6
  • 1
    When conda gets stuck in very long solving issues, I highly recommend giving [mamba](https://github.com/mamba-org/mamba) a try. It's a drop-in replacement for conda and is very, very fast. – cel Dec 22 '21 at 15:47
  • FYI: https://stackoverflow.com/help/someone-answers – merv Dec 31 '21 at 14:27

1 Answers1

0

This solves fine (so-devenv), but is indeed a complex solve mainly due to:

  • underspecification
  • lack of modularization

Underspecification

This particular environment specification ends up installing well over 300 packages. And there isn't a single one of those that are constrained by the specification. That is a huge SAT problem to solve and Conda will struggle with this. Mamba will help solve faster, but providing additional constraints can vastly reduce the solution space.

At minimum, specify a Python version (major.minor), such as python=3.9. This is the single most effective constraint.

Beyond that, putting minimum requirements on central packages (those that are dependencies of others) can help, such as minimum NumPy.

Lack of Modularization

I assume the name "devenv" means this is a development environment. So, I get that one wants all these tools immediately at hand. However, Conda environment activation is so simple, and most IDE tooling these days (Spyder, VSCode, Jupyter) encourages separation of infrastructure and the execution kernel. Being more thoughtful about how environments (emphasis on the plural) are organized and work together, can go a long way in having a sustainable and painless data science workflow.

The environment at hand has multiple red flags in my book:

  • conda-build should be in base and only in base
  • snakemake should be in a dedicated environment
  • notebook (i.e., Jupyter) should be in a dedicated environment, co-installed with nb_conda_kernels; all kernel environments need are ipykernel

I'd probably also have the linting/formatting packages separated, but that's less an issue. The real killer though is snakemake - it's just a massive piece of infrastructure and I'd strongly encourage keeping that separated.

merv
  • 67,214
  • 13
  • 180
  • 245
  • What if I want to run the snakemake pipeline? This would require all the packages installed in the environment where I run snakemake. – Ildar Feb 06 '22 at 18:56
  • @Ildar a more idiomatic Snakemake pipeline is to use `conda:` rules and separate, minimal YAML files to define the small subset of software actually needed to execute a rule or set of rules. – merv Feb 06 '22 at 19:13
  • Maybe you encountered this? How, for instance, I can specify which environment to use when I run the particular parts of the pipeline? Actually, I'm used to Kedro for now, and I haven't found any functionality to handle this. However, the problem with having kedro and other common packages in one single environment still exists. – Ildar Feb 06 '22 at 19:33