1

I'm currently struggling with setting up Serving endpoints for classification models in Azure Databricks.

I've tried this for a few different classification models, such as the following example provided by databricks themselves, using automl: https://docs.databricks.com/_extras/notebooks/source/machine-learning/automl-classification-example.html

My Databricks environment details are as follows: Pricing Tier: Premium Location: West Europe Cluster Policy: Personal Compute, Databricks Runtime Version: 12.2 LTS ML (includes Apache Spark 3.3.2, Scala 2.12) Access Mode: Single Access

The model is created and registered to the model registry with no issues, but when I follow the GUI to create the endpoint, it gives the following errors in the logs: Events: enter image description here Logs: enter image description here

I get this same "An error occurred while loading the model. No module named 'pandas.core.indexes.numeric'" for a variety of different classification models. Most other articles are not Databricks specific, so reference incompatibilities between Pandas versions, but Pandas isn't specified as a module in the requirements package of the model, so this isn't configurable when creating the serving endpoint. Of these other articles, it is suggested Pandas 2.0 is a possible cause, but running pandas.version shows that this is running 1.4.2: enter image description here

Has anyone seen something similar before and is able to advise on a solution or tips to debug?

Many thanks, Matt

gtomer
  • 5,643
  • 1
  • 10
  • 21
MattC
  • 11
  • 1

3 Answers3

0

You can follow below approach to add library to conda file.

cd = {
"channels":["conda-forge"],
"dependencies": ["python=3.9.5","pip<=21.2.4","pandas==1.5.3",
{
"pip":["mlflow<3,>=2.1","cffi==1.15.0","cloudpickle==2.0.0","psutil==5.8.0","scikit-learn==1.0.2","typing-extensions==4.1.1"]
},],
"name": "mlflow-env"
}

with mlflow.start_run():
    iris = datasets.load_iris()
    iris_train = pd.DataFrame(iris.data, columns=iris.feature_names)
    clf = RandomForestClassifier(max_depth=7, random_state=0)
    clf.fit(iris_train, iris.target)
    mlflow.sklearn.log_model(clf, "iris_rf", registered_model_name="model-libs-9741",conda_env=cd)

Output will be,

Registered model 'model-libs-9741' already exists. Creating a new version of this model... Created version '2' of model 'model-libs-9741'.

Since i have already registered the model second version is created.

Below is the dependencies before adding custom conda-env.

enter image description here

After executing the above code, we get another version of model having pandas library in Conda file.

enter image description here

After registering the model, create serving endpoint with the version having pandas dependencies.

JayashankarGS
  • 1,501
  • 2
  • 2
  • 6
0

Thanks for your response on this, Jayashankar. This did indeed help when I replicated your example and I was able to apply this to my specific situation.

This however showed me further errors, with other missing modules appearing in the Service Logs.

I tried debugging a bit further, but rolling back to runtime 11.3 LTS ML runtime and recreating my models this way proved successful. I'll have a dig around to see if there is any more evidence on the cause of this, or if it is a 12.2 LTS ML runtime bug.

Thanks

MattC
  • 11
  • 1
0

They are talking about the exact error in their documentation.

No module named ‘pandas.core.indexes.numeric

When serving a model built using AutoML with Model Serving, you may get the error: No module named 'pandas.core.indexes.numeric.

This is due to an incompatible pandas version between AutoML and the model serving endpoint environment. You can resolve this error by running the add-pandas-dependency.py script. The script edits the requirements.txt and conda.yaml for your logged model to include the appropriate pandas dependency version: pandas==1.5.3.

  1. Modify the script to include the run_id of the MLflow run where your model was logged.
  2. Re-registering the model to the MLflow model registry.
  3. Try serving the new version of the MLflow model.
JJ.
  • 76
  • 5