I am trying to build an API using an MLflow model.
the funny thing is it works from one location on my PC and not from another. So, the reason for doing I wanted to change my repo etc.
So, the simple code of
from mlflow.pyfunc import load_model
MODEL_ARTIFACT_PATH = "./model/model_name/"
MODEL = load_model(MODEL_ARTIFACT_PATH)
now fails with
ERROR: Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 540, in lifespan
async for item in self.lifespan_context(app):
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 481, in default_lifespan
await self.startup()
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 516, in startup
await handler()
File "/code/./app/main.py", line 32, in startup_load_model
MODEL = load_model(MODEL_ARTIFACT_PATH)
File "/usr/local/lib/python3.8/dist-packages/mlflow/pyfunc/__init__.py", line 733, in load_model
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/usr/local/lib/python3.8/dist-packages/mlflow/spark.py", line 737, in _load_pyfunc
return _PyFuncModelWrapper(spark, _load_model(model_uri=path))
File "/usr/local/lib/python3.8/dist-packages/mlflow/spark.py", line 656, in _load_model
return PipelineModel.load(model_uri)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/util.py", line 332, in load
return cls.read().load(path)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/pipeline.py", line 258, in load
return JavaMLReader(self.cls).load(path)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/util.py", line 282, in load
java_obj = self._jread.load(path)
File "/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
The model artifacts are already downloaded to the folder /model folder which has the following structure.
the load model call is in the main.py file As I mentioned it works from another directory, but there is no reference to any absolute paths. Also, I have made sure that my package references are identical. e,g I have pinned them all down
# Model
mlflow==1.25.1
protobuf==3.20.1
pyspark==3.2.1
scipy==1.6.2
six==1.15.0
also, the same docker file is used both places, which among other things, makes sure that the final resulting folder structure is the same
......other stuffs
COPY ./app /code/app
COPY ./model /code/model
what can explain it throwing this exception whereas in another location (on my PC), it works (same model artifacts) ?
Since it uses load_model function, it should be able to read the parquet files ?
Any question and I can explain.
EDIT1: I have debugged this a little more in the docker container and it seems the parquet files in the itemFactors folder (listed in my screenshot above) are not getting copied over to my image , even though I have the copy command to copy all files under the model folder. It is copying the _started , _committed and _SUCCESS files, just not the parquet files. Anyone knows why would that be? I DO NOT have a .dockerignore file. Why are those files ignored while copying?