4

I'm trying to utilize Hydra with MLFlow, so I wrote the bare minimum script to see if they worked together (importing etc.). Both work fine on their own, but when put together I get a weird outcome.

I have the script below:

import hydra
from omegaconf import DictConfig
from mlflow import log_metric, log_param, log_artifact,start_run

@hydra.main(config_path="config.yaml")
def my_app(cfg : DictConfig):
    # print(cfg.pretty())
    # print(cfg['coordinates']['x0'])
    log_param("a",2)
    log_metric("b",3)

if __name__ == "__main__":
    my_app()

However when ran, I get the error below:

ilknull@nurmachine:~/Files/Code/Python/MLFlow_test$ python3 hydra_temp.py 
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/ilknull/.local/lib/python3.7/site-packages/mlflow/tracking/fluent.py", line 164, in end_run
    MlflowClient().set_terminated(_active_run_stack[-1].info.run_id, status)
  File "/home/ilknull/.local/lib/python3.7/site-packages/mlflow/tracking/client.py", line 311, in set_terminated
    self._tracking_client.set_terminated(run_id, status, end_time)
  File "/home/ilknull/.local/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 312, in set_terminated
    end_time=end_time)
  File "/home/ilknull/.local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 377, in update_run_info
    run_info = self._get_run_info(run_id)
  File "/home/ilknull/.local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 442, in _get_run_info
    databricks_pb2.RESOURCE_DOES_NOT_EXIST)
mlflow.exceptions.MlflowException: Run '9066793c02604a6783d081ed965d5eff' not found

Again, they work perfectly fine when used separately, but together they cause this error. Any ideas?

Ilknur Mustafa
  • 301
  • 2
  • 11

1 Answers1

3

Thanks for reporting this. I was not aware of this issue.

This is because Hydra is changing your current working directory for each run.

I did some digging, this is what you can do:

  1. Set the MLFLOW_TRACKING_URI environment variable:
MLFLOW_TRACKING_URI=file:///$(pwd)/.mlflow  python3 hydra_temp.py
  1. Call set_tracking_url() before hydra.main() starts:
import hydra
from omegaconf import DictConfig
from mlflow import log_metric, log_param, set_tracking_uri
import os

set_tracking_uri(f"file:///{os.getcwd()}/.mlflow")

@hydra.main(config_name="config")
def my_app(cfg: DictConfig):
    log_param("a", 2)
    log_metric("b", 3)


if __name__ == "__main__":
    my_app()
  1. Wait for my new issue to get resolved, then there will have a proper plugin to integrate with mlflow. (This will probably take a while).

By the way, Hydra 1.0 has new support for setting environment variables:

This ALMOST works:

hydra:
  job:
    env_set:
      MLFLOW_TRACKING_DIR: file://${hydra:runtime.cwd}/.mlflow
      MLFLOW_TRACKING_URI: file://${hydra:runtime.cwd}/.mlflow

Unfortunately Hydra is cleaning up the env variables when your function exits, and MLFlow is making the final save when the process exits so the env variable is no longer set. MLFlow also keeps re-initializing the FileStore object used to store the experiments data. If they would have initialized it just once and reused the same object the above should would have worked.

Omry Yadan
  • 31,280
  • 18
  • 64
  • 87