5

I am creating an mlflow experiment which logs a logistic regression model together with a metric and an artifact.

import mlflow
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support

with mlflow.start_run(run_name=run_name, experiment_id=experiment_id):

        logreg = LogisticRegression()
        logreg.fit(x_train, y_train)
        print('training over', flush=True)
        y_pred = logreg.predict(x_test)
        mlflow.sklearn.log_model(logreg, "model")
   
        mlflow.log_metric("f1", precision_recall_fscore_support(y_test, y_pred, average='weighted')[2])
        mlflow.log_artifact(x_train.to_csv('train.csv')

for some data (x_train, y_train, x_test, y_test)

Is there any way to access the artifacts for that specific experiment_id for this run_name and read the train.csv and also read the model ?

quant
  • 4,062
  • 5
  • 29
  • 70

2 Answers2

5

There is a download_artifacts function that allows you to get access to the logged artifact:

local_path = client.download_artifacts(run_id, "train.csv", local_dir)

The model artifact could either downloaded using the same function (there should be the object called model/model.pkl (for scikit-learn, or something else), or you can load model by run:

loaded_model = mlflow.pyfunc.load_model(f"runs:/{run_id}/model")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
4

I couldn't get python api to work with mlflow instance which uses a file system for the artifacts storage (accessing from another machine, locally it should work ok). Also, REST api is of no help as there is no method for downloading artifacts. But I was able to get it to work using HTML instead, here sample which loads the artifact csv file for the provided run id into pandas dataframe:

import pandas as pd
import urllib.request
import io

with urllib.request.urlopen('http://server:5000/get-artifact?path=dataframe.csv&run_uuid=75hf8234h9dj29jr943909') as f:
    file = pd.read_csv(io.BytesIO(f.read()))
wsl
  • 101
  • 4