2

I am trying to manage the results of machine learning with mlflow and hydra. So I tried to run it using the multi-run feature of hydra. I used the following code as a test.

import mlflow
import hydra
from hydra import utils
from pathlib import Path
import time


@hydra.main('config.yaml')
def main(cfg):
    print(cfg)


    mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
    mlflow.set_experiment(cfg.experiment_name)


    mlflow.log_param('param1',5)
    # mlflow.log_param('param1',5)
    # mlflow.log_param('param1',5)

    with mlflow.start_run() :
        mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')


if __name__ == '__main__':
    main()

This code will not work. I got the following error

Exception: Run with UUID [RUNID] is already active. To start a new run, first end the current run with mlflow.end_run(). To start a nested run, call start_run with nested=True

So I modified the code as follows

import mlflow
import hydra
from hydra import utils
from pathlib import Path
import time


@hydra.main('config.yaml')
def main(cfg):
    print(cfg)


    mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
    mlflow.set_experiment(cfg.experiment_name)


    mlflow.log_param('param1',5)
    # mlflow.log_param('param1',5)
    # mlflow.log_param('param1',5)

    with mlflow.start_run(nested=True) :
        mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')


if __name__ == '__main__':
    main()

This code works, but the artifact is not saved. The following corrections were made to save the artifacts.

import mlflow
import hydra
from hydra import utils
from pathlib import Path
import time


@hydra.main('config.yaml')
def main(cfg):
    print(cfg)


    mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
    mlflow.set_experiment(cfg.experiment_name)


    mlflow.log_param('param1',5)
    # mlflow.log_param('param1',5)
    # mlflow.log_param('param1',5)

    
    mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')


if __name__ == '__main__':
    main()

As a result, the artifacts are now saved. However, when I run the following command

python test.py model=A,B hidden=12,212,31 -m

Only the artifact of the last execution condition was saved.

How can I modify mlflow to manage the parameters of the experiment by taking advantage of the multirun feature of hydra?

Omry Yadan
  • 31,280
  • 18
  • 64
  • 87
musako
  • 897
  • 2
  • 10
  • 26

3 Answers3

1

MLFlow is not officially supported by Hydra. At some point there will be a plugin that will make this smoother.

Looking at the errors you are reporting (and without running your code): One thing that you can try to to use the Joblib launcher plugin to get job isolation through processes (this requires Hydra 1.0.0rc1 or newer).

Omry Yadan
  • 31,280
  • 18
  • 64
  • 87
1

What you are observing is due to the interaction between MLFlow and Hydra. As far as MLflow can tell, all of your Hydra multiruns are the same MLflow run!

Since both frameworks use the term "run", I will need to be verbose in the following text. Please bear with me.

If you didn't explicitly start a MLflow run, MLflow will do it for you when you do mlflow.log_params or mlflow.log_artifacts. Within a Hydra multirun context, it appears that instead of creating a new MLflow run for each Hydra run, the previous MLflow run is inherited after the first Hydra run. This is why you would get this error where MLflow thinks you are trying to update parameter values in logging: mlflow.exceptions.MlflowException: Changing param values is not allowed.

You can fix this by wrapping your MLFlow logging code within a with mlflow.start_run() context manager:

import mlflow
import hydra
from hydra import utils
from pathlib import Path

@hydra.main(config_path="", config_name='config.yaml')
def main(cfg):
    print(cfg)
    mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
    mlflow.set_experiment(cfg.experiment_name)

    with mlflow.start_run() as run:
        mlflow.log_params(cfg)
        mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
        print(run.info.run_id) # just to show each run is different

if __name__ == '__main__':
    main()

The context manager will start and end MLflow runs properly, preventing the issue from occuring.

Alternatively, you can also start and end an MLFlow run manually:

activerun = mlflow.start_run()
mlflow.log_params(cfg)
mlflow.log_artifact(Path.cwd() / '.hydra/config.yaml')
print(activerun.info.run_id) # just to show each run is different
mlflow.end_run()
tnwei
  • 860
  • 7
  • 15
0

This is related to the way you defined your MLFlow run. You use log_params and then start_run, so you have two concurrent runs of mlflow which explains the error. You could try getting rid of the following line in your first code sample and see what happens

mlflow.log_param('param1',5)