Custom python model : succeed to load but fail to predict/serve

Question

I have a custom python model, which basically sets up several perturbations of a scikit-learn estimator. I do succeed in running the project with mlflow run project_directory CLI, saving the model with a save_model() statement. It appears on the dashboard with mlflow ui. I can even load the saved model within my main.py script and predict on a pandas.DataFrame without any problem.

My problem comes when I try to mlflow models serve -m project/models/run_id of mlflow models predict -m project/models/run_id -i data.json. I get the following error :

ModuleNotFoundError: No module named 'multi_model'

In the MLflow documentation, there is no example of a custom model served, so I can't figure out how to solve this dependency problem. Here is my project tree :

project/
├── MLproject
├── __init__.py
├── conda.yaml
├── loader.py
├── main.py
├── models
│   └── 0ef267b0c9784a118290fa1ff579adbe
│       ├── MLmodel
│       ├── conda.yaml
│       └── python_model.pkl
├── multi_model.py

multi_model.py :

import numpy as np
from mlflow.pyfunc import PythonModel
from sklearn.base import clone

class MultiModel(PythonModel):

    def __init__(self, estimator=None, n=10):
        self.n = n
        self.estimator = estimator

    def fit(self, X, y=None):
        self.estimators = []
        for i in range(self.n):
            e = clone(self.estimator)
            e.set_params(random_state=i)
            X_bootstrap = X.sample(frac=1, replace=True, random_state=i)
            y_bootstrap = y.sample(frac=1, replace=True, random_state=i)
            e.fit(X_bootstrap, y_bootstrap)
            self.estimators.append(e)
        return self

    def predict(self, context, X):
        return np.stack([
            np.maximum(0, self.estimators[i].predict(X))
            for i in range(self.n)], axis=1
        )

main.py :

import os
import click
from sklearn.ensemble import RandomForestRegressor
import mlflow.pyfunc
import multi_model

@click(...) # define the click options according to MLproject file
def run(next_week, window_size, nfold):
    train = loader.load(start_week, current_week)
    x_train, y_train = train.drop(columns=['target']), train['target']

    model = multi_model.MultiModel(RandomForestRegressor())

    with mlflow.start_run() as run:
        model.fit(x_train, y_train)
        model_path = os.path.join('models', run.info.run_id)
        mlflow.pyfunc.save_model(
            path=model_path, 
            python_model=model,
        )

if __name__ == '__main__':
    run()

Could you add the content of the `loader.py` to support? Even though it's not essential for your question, it'll be helpful for anyone stumbling over this topic while searching for "mlflow python model", since the [official MLflow documentation](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#workflows) is a bit sparse in examples. — Thomas, Nov 13 '20 at 10:42

Funky · Answer 1 · 2019-10-02T15:55:16.843

4

Problem solved: in main.py, simply update the save_model() command with :

mlflow.pyfunc.save_model(
        path=model_path,
        python_model=model,
        code_path=['multi_model.py'],
        conda_env={
            'channels': ['defaults', 'conda-forge'],
            'dependencies': [
                'mlflow=1.2.0',
                'numpy=1.16.5',
                'python=3.6.9',
                'scikit-learn=0.21.3',
                'cloudpickle==1.2.2'
            ],
            'name': 'mlflow-env'
        }
    )

edited Oct 02 '19 at 15:55

answered Oct 02 '19 at 14:16

Funky

277
3
12

1

Was adding the `conda_env` part of the solution? Isn't the `code_path=['multi_model.py']` part sufficient as denoted in [mlflow.pyfunc.log_model](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.log_model)? I'm confused because you have defined your env already in the `conda.yaml`. – Thomas Nov 13 '20 at 10:35

Muhammed AH · Answer 2 · 2021-12-06T12:51:03.087

To answer the question from Thomas, the existing conda.yml file should suffice. I had a similar issue and was able to solve it using only the code_paths parameter of myflow.pyfunc.save_model().

My project structure:

project/ 
|--- models/
|    |- myModel.py 
|    |- otherFile.py

Inside myModel.py:

import mlflow

class model_base(mlflow.pyfunc.PythonModel):
     ......

Inside otherFile.py

from models.myModel import model_base
import os

model = model_base()
code_path_parent = os.path.abspath("./")

def save_model():
   mlflow.pyfunc.save_model(
          path=*<save_path>*,
          python_model=model,
          code_path=[code_path_parent]
          )

This is specified in the mlflow docs under the save_model() description;

code_path description

python_model description

mlflow docs; https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.save_model

Custom python model : succeed to load but fail to predict/serve

2 Answers2