I have a custom python model, which basically sets up several perturbations of a scikit-learn estimator. I do succeed in running the project with mlflow run project_directory
CLI, saving the model with a save_model()
statement. It appears on the dashboard with mlflow ui
. I can even load the saved model within my main.py
script and predict on a pandas.DataFrame without any problem.
My problem comes when I try to mlflow models serve -m project/models/run_id
of mlflow models predict -m project/models/run_id -i data.json
. I get the following error :
ModuleNotFoundError: No module named 'multi_model'
In the MLflow documentation, there is no example of a custom model served, so I can't figure out how to solve this dependency problem. Here is my project tree :
project/
├── MLproject
├── __init__.py
├── conda.yaml
├── loader.py
├── main.py
├── models
│ └── 0ef267b0c9784a118290fa1ff579adbe
│ ├── MLmodel
│ ├── conda.yaml
│ └── python_model.pkl
├── multi_model.py
multi_model.py
:
import numpy as np
from mlflow.pyfunc import PythonModel
from sklearn.base import clone
class MultiModel(PythonModel):
def __init__(self, estimator=None, n=10):
self.n = n
self.estimator = estimator
def fit(self, X, y=None):
self.estimators = []
for i in range(self.n):
e = clone(self.estimator)
e.set_params(random_state=i)
X_bootstrap = X.sample(frac=1, replace=True, random_state=i)
y_bootstrap = y.sample(frac=1, replace=True, random_state=i)
e.fit(X_bootstrap, y_bootstrap)
self.estimators.append(e)
return self
def predict(self, context, X):
return np.stack([
np.maximum(0, self.estimators[i].predict(X))
for i in range(self.n)], axis=1
)
main.py
:
import os
import click
from sklearn.ensemble import RandomForestRegressor
import mlflow.pyfunc
import multi_model
@click(...) # define the click options according to MLproject file
def run(next_week, window_size, nfold):
train = loader.load(start_week, current_week)
x_train, y_train = train.drop(columns=['target']), train['target']
model = multi_model.MultiModel(RandomForestRegressor())
with mlflow.start_run() as run:
model.fit(x_train, y_train)
model_path = os.path.join('models', run.info.run_id)
mlflow.pyfunc.save_model(
path=model_path,
python_model=model,
)
if __name__ == '__main__':
run()