3

I am creating a custom myflow.pyfunc object that I would like to save to MLFlow and retrieve later. I don't understand the relationship between the object that is saved with mlflow.pyfunc.save_model(), and the one that is retrieved with mlflow.pyfunc.load_model().

The loaded model is a 'PythonModelContext' object rather than my original python class. When I try to use the predict method in the loaded version I get an error.

Here I initialise MLflow and create a dummy example of my class

# load 
import os
import tempfile
from pathlib import Path
import pandas as pd
import mlflow
from mlflow.tracking import MlflowClient
import mlflow.pyfunc
from mlflow.pyfunc import PythonModelContext

# initialise MLFlow
mlflow_var = os.getenv('HYMIND_REPO_TRACKING_URI')
mlflow.set_tracking_uri(mlflow_var)   

client = MlflowClient()

# Define the class that will be used for fit and predict (dummy example)
class PredictSpeciality(mlflow.pyfunc.PythonModel):
    
    def fit(self):
        print('fit')
        d = {'col1': [1, 2], 'col2': [3, 4]}
        df = pd.DataFrame(data=d)
        return df
           
    def predict(self, X, y=None):
        print('predict')
        print(X.shape)
        return 

If I now run the class as it is the predict method works:

# Use of this predictor before saving works fine 
m = PredictSpeciality()
df = m.fit()
m.predict(df)

But if I save the model to the registry, and then re-load it, the predict method no longer works:

counter +=1
exp_name = 'MLflow-test-' + str(counter)

os.environ["MLFLOW_EXPERIMENT_NAME"] = exp_name
experiment_id = mlflow.create_experiment(exp_name)

mlflow.set_experiment(exp_name)
experiment = dict(mlflow.get_experiment_by_name(exp_name))
experiment_id = experiment['experiment_id']

with mlflow.start_run():
    
    # dummy code here for fitting a model
    m = PredictSpeciality()
    df = m.fit()
    
# mark best run
runs = mlflow.search_runs()
best_run_id = runs['run_id'][0]

# tag the best run and save model
with mlflow.start_run(run_id=best_run_id):
    mlflow.set_tag('best_run_', 1)   

    mlflow_model_path = f'/data/hymind/repo/{experiment_id}/{best_run_id}/artifacts/model/'
    mlflow.pyfunc.save_model(path=mlflow_model_path, python_model=m)
    
# end experiment and register best model
model_name = 'MLflow-test' + str(counter)
registered_model = mlflow.register_model(f'runs:/{best_run_id}/model', model_name)

# now attempt to make a prediction using the loaded model
model_version = 1
m = mlflow.pyfunc.load_model(f"models:/{model_name}/{model_version}")
m.predict(df)

In this case, I get the attribute error

AttributeError: 'PythonModelContext' object has no attribute 'shape'

How do I get the original model back from the 'PythonModelContext' object?

zmek
  • 53
  • 7

1 Answers1

2

If you take a close look at the signature of the abstract method predict() in the mlflow.pyfunc.PythonModel class that you are extending, you will see that has 3 parameters:

def predict(self, context, model_input):

So, if you change your simple class to have the extra parameter context, your example should work:

class PredictSpeciality(mlflow.pyfunc.PythonModel):
    def fit(self):
        print('fit')
        d = {'col1': [1, 2], 'col2': [3, 4]}
        df = pd.DataFrame(data=d)
        return df
           
    def predict(self, context, X, y=None):
        print('predict')
        print(X.shape)
        return 

To elaborate a bit more on what is going on here: There are 2 classes at play: mlflow.pyfunc.PythonModel and mlflow.pyfunc.PyFuncModel.

The mlflow.pyfunc.PythonModel is being wrapped by the mlflow.pyfunc.PyFuncModel. The former is doing the actual work and the latter is dealing with the metadata, packaging, conda environment, etc. In the documentation it is explained like so:

Python function models are loaded as an instance of mlflow.pyfunc.PyFuncModel, which is an MLflow wrapper around the model implementation and model metadata (MLmodel file).

Unfortunately, the documentation also states that you cannot create a PyFuncModel directly, but only

Wrapper around model implementation and metadata. This class is not meant to be constructed directly. Instead, instances of this class are constructed and returned from mlflow.pyfunc.load_model().

I find that quite limiting and am unsure why it was designed this way, however, there are 2 things that you can do here:

  1. Pass in an extra parameter when directly dealing with your wrapped class:
   m.predict(None, df)
  1. Save and load the model to get an mlflow.pyfunc.PyFuncModel:
   mlflow.pyfunc.save_model(path="temp_model", python_model=m)
   m2 = mlflow.pyfunc.load_model("temp_model")
   m2.predict(df)

I know it isn't elegant, but I actually have been using #2 in the past. It would be good if someone from the MLFlow team could comment on why direct creation of a mlflow.pyfunc.PyFuncModel is not supported.

Daniel Schneider
  • 1,797
  • 7
  • 20
  • with this solution, m.predict() works for the re-loaded model. However, m.predict() fails on the earlier step, before saving the model, with this error "predict() missing 1 required positional argument: 'X'. It seems like an additional parameter needs to be specified. Calling m.predict() on the re-loaded model returns the 'PythonModelContext' object which, presumably, is the context referred to in 'def predict(self, context, X, y=None)' but I'm not sure how to pass or refer to this object when calling other methods. – zmek Oct 27 '22 at 22:11
  • I added some more details on the limitations of the PythonModel/PyFuncModel concepts -- I don't think they are particularly well chosen trade-offs, but I believe, they can be worked with. – Daniel Schneider Oct 28 '22 at 08:26
  • Thanks Daniel for the very useful advice. This answers my question. I simplified my problem as posted here, so while this solution is great, it doesn't solve my original problem. I have done a follow-up post which is here: https://stackoverflow.com/questions/74257370/mlflow-pyfunc-load-model-mlflow-pyfunc-save-model-how-to-pass-additional-art. I agree with you and others that the MLFlow documentation could be more helpful on the use of custom python functions – zmek Oct 30 '22 at 23:11