mlflow.pyfunc.load_model / mlflow.pyfunc.save_model - how to pass additional artifacts as parameters

Question

In a previous post I asked about saving and loading models with custom myflow.pyfunc objects and received an excellent answer from Daniel Schneider explaining the difference between mlflow.pyfunc.PythonModel and mlflow.pyfunc.PyFuncModel.

Here, I extend the question, as the proposed solution doesn't work for me when also trying to save and retrieve model artifacts.

I have a class with a 'fit' function that calculates some values that are saved to a dict, and a 'predict' function that uses the values. The predict function works before saving to ML flow, but not on subsequent re-loading.

Initially creating the class and running it outside MLFlow (using the solution proposed by Daniel Schneider of passing None into the predict function) works fine.

# dummy data
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=data)

# create class
class PredictSpeciality(mlflow.pyfunc.PythonModel):
    def fit(self):
        print('fit')
        d = {'mult': 2}
        return d
           
    def predict(self, context, X, d, y=None):
        print('predict')
        X['pred'] = X['col1'] * d['mult']
        return X

# create instance of model, return weights dict and pass weights into predict function
m = PredictSpeciality()
d = m.fit()
m.predict(None, df, d)

However, saving and re-loading from MLFlow:

mlflow.pyfunc.save_model(path="temp_model", python_model=m)
m2 = mlflow.pyfunc.load_model("temp_model")
m2.predict(None, df, d)

Returns the following error:

predict() takes 2 positional arguments but 4 were given

I'm assuming this is again due to the differences outlined before between mlflow.pyfunc.PythonModel and mlflow.pyfunc.PyFuncModel but I'm not sure how to handle it.

score 1 · Answer 1 · answered Oct 31 '22 at 20:17

The solution is to pass all the model_input data including the artefacts into the model as one argument. This now correctly calls and returns output from the predict method.

# dummy data
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=data)

# create class
class PredictSpeciality(mlflow.pyfunc.PythonModel):
    def fit(self):
        print('fit')
        d = {'mult': 2}
        return d
           
    def predict(self, context, X, y=None):
        print('predict')
        df, d = X
        df['pred'] = df['col1'] * d['mult']
        return df

# create instance of model, return weights dict 
m = PredictSpeciality()
d = m.fit()

# create model input for predict function as tuple
model_input = ([df, d])
m.predict(None, model_input)

# save and re-load from ML flow
mlflow.pyfunc.save_model(path="temp_model", python_model=m)
m2 = mlflow.pyfunc.load_model("temp_model")
m2.predict(model_input)

Thanks to https://github.com/jongillham for answering this question — zmek, Oct 31 '22 at 20:20

mlflow.pyfunc.load_model / mlflow.pyfunc.save_model - how to pass additional artifacts as parameters

1 Answers1

Linked