8

I currently created an experiment in mlflow and created multiple runs in the experiment.

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import mlflow

experiment_name="experiment-1"
mlflow.set_experiment(experiment_name)

no_of_trees=[100,200,300]
depths=[2,3,4]
for trees in no_of_trees:
    for depth in depths:
        with mlflow.start_run() as run:
            model=RandomForestRegressor(n_estimators=trees, criterion='mse',max_depth=depth)
            model.fit(x_train, y_train)
            predictions=model.predict(x_cv)
            mlflow.log_metric('rmse',mean_squared_error(y_cv, predictions))

after creating the runs, I wanted to get the best run_id for this experiment. for now, I can get the best run by looking at the UI of mlflow but how can we do right the program?

Ravi
  • 2,778
  • 2
  • 20
  • 32

2 Answers2

21

we can get the experiment id from the experiment name and we can use python API to get the best runs.

experiment_name = "experiment-1"
current_experiment=dict(mlflow.get_experiment_by_name(experiment_name))
experiment_id=current_experiment['experiment_id']

By using the experiment id, we can get all the runs and we can sort them based on metrics like below. In the below code, rmse is my metric name (so it may be different for you based on metric name)

df = mlflow.search_runs([experiment_id], order_by=["metrics.rmse DESC"])
best_run_id = df.loc[0,'run_id']
Ravi
  • 2,778
  • 2
  • 20
  • 32
1

Search by name or other property directly, using filter_string:

mlflow.search_runs(filter_string="run_name='CV_M1_A1_regional'")['run_id'] #539aa3507ba54ebf86e64c7c9766fcee
Maciej Skorski
  • 2,303
  • 6
  • 14