How can I retrive the model.pkl in the experiment in Databricks

Question

I want to retrieve the pickle off my trained model, which I know is in the run file inside my experiments in Databricks.

It seems that the mlflow.pyfunc.load_model can only do the predict method.

There is an option to directly access the pickle?

I also tried to use the path in the run using the pickle.load(path) (example of path: dbfs:/databricks/mlflow-tracking/20526156406/92f3ec23bf614c9d934dd0195/artifacts/model/model.pkl).

score 3 · Answer 1 · answered Aug 10 '21 at 02:37

3

Use the frmwk's native load_model() method (e.g. sklearn.load_model()) or download_artifacts()

answered Aug 10 '21 at 02:37

Andre

304
1
2

1

but what I use for path? Or I give the run code? – Leonardo Kanashiro Felizardo Aug 10 '21 at 12:53

Leonardo Kanashiro Felizardo · Accepted Answer · 2022-01-24T20:08:25.513

2

I recently found the solution which can be done by the following two approaches:

Use the customized predict function at the moment of saving the model (check databricks documentation for more details).

example give by Databricks

class AddN(mlflow.pyfunc.PythonModel):

    def __init__(self, n):
        self.n = n

    def predict(self, context, model_input):
        return model_input.apply(lambda column: column + self.n)
# Construct and save the model
model_path = "add_n_model"
add5_model = AddN(n=5)
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)

# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)

Load the model artefacts as we are downloading the artefact:

from mlflow.tracking import MlflowClient

client = MlflowClient()

tmp_path = client.download_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path='model/model.pkl')

f = open(tmp_path,'rb')

model = pickle.load(f)

f.close()

 

client.list_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path="")

client.list_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path="model")

edited Jan 24 '22 at 20:08

answered Aug 23 '21 at 19:53

Leonardo Kanashiro Felizardo

240
2
17

For the second solution/approach, what version of pickle package have you used? I experiment with the following packages: mlflow==1.22.0, cloudpickle==1.6.0, pickle5==0.0.12 and when loading the model via pickle.load("my_onened_pkl_file"), I got the error: in load_reduce stack[-1] = func(*args) TypeError: code() takes at most 15 arguments (16 given) – florins Nov 29 '22 at 10:48
1

mlflow == 2.0.1; cloudpickle == 2.0.0; pickle == 4.0. – Leonardo Kanashiro Felizardo Nov 30 '22 at 16:32
Thank you, Leonardo! Can you please tell me which Python version? – florins Dec 02 '22 at 16:31
1

The Python version is: 3.9.5 – Leonardo Kanashiro Felizardo Dec 02 '22 at 17:09

How can I retrive the model.pkl in the experiment in Databricks

2 Answers2