2

I have configured a mlflow project file. First hard knock was that the extension is not required. The current problem is that I have exported an existing conda environment using:

conda env export --name ENVNAME > envname.yml

substituting the ENVNAME. This envname.yml file has the actual path where the env is located. Next, I have placed the envname.yml and defined entry points correctly.

name: pytorch
channels:
  - defaults
prefix: /data/krishnan/software/anaconda3/envs/pytorch

When I run the project using mlflow run ., I find that mlflow tries to create yet one more temporary environment based on this Conda file which is Python 2. It ignores that the specified env exists and all packages are correct.

Is there anything incorrect in what I am doing?

merv
  • 67,214
  • 13
  • 180
  • 245
  • 1
    I can't say for mlflow, but that's not how Conda YAML definitions work. The `prefix` key is only used if there is no `name` key (see [details on YAML keys](https://stackoverflow.com/a/65243743/570918)). Moreover, the YAML file is used to specify *new* environments (or update existing ones), but not for pointing to an existing one to be reused. Not sure if comparable, but the Snakemake pipeline tool discourages reusing local Conda environments, instead preferring recreation from YAML, since that gives better reproducibility. I'd expect other pipeline tools to have similar policy. – merv Dec 03 '21 at 20:16
  • 1
    Also, maybe clarify why you care about reusing a specific environment. If it's about disk usage, [Conda uses hardlinks](https://stackoverflow.com/q/55566419/570918), so "duplicate" environments typical do not physically duplicate all files. If it's about being able to change one environment and use it on all workflows, that strays into problems of reproducibility. I.e., changing the environment to accommodate a later workflow could have cryptic side-effects on earlier ones. – merv Dec 03 '21 at 20:29
  • I had the conda environment all set up and using it for all the work. I am now trying out mlflow. Since the mlflow project mentioned that it is possible to use a conda env, i thought that the easier way to get started would be to just "point" to an existing environment with all its packages. Looks like my assumptions are wrong on multiple counts: both conda and mlflow. However, why would it download python2. – skrishnan_v Dec 04 '21 at 05:53
  • 1
    I ignored the conda environment by using --no-conda and after activating the environment, i am able to get mlflow to run this project. – skrishnan_v Dec 04 '21 at 06:15
  • If that is the entire YAML file, it looks empty and so the Python 2 you are seeing is likely just the system one. Perhaps the environment was not correctly exported. – merv Dec 04 '21 at 21:46
  • That is right. the file looked quite empty to me as well. I assumed that since the conda environment name is mentioned; it might pick up something from there. But as i said, i am not very knowledgeable in conda; – skrishnan_v Dec 13 '21 at 13:52

0 Answers0