I have trained my pycaret model locally and then later on I pushed it to S3. Now I want to run a predict_model() method on my bigger production datasets.
Using boto3 I copy my model pickle file from S3 to a master node of my Spark EMR cluster. Then I import the library using
from pycaret.classification import *
And try to apply my predictions as below -
model_path = '/tmp/catboost_model_aug19'
saved_model = load_model(model_path)
Transformation Pipeline and Model Successfully Loaded
new_data = spark.sql("select * from table").toPandas()
df = predict_model(saved_model, data = new_data)
When I run the predict_model()
it errors out saying Pipeline not found
Alternatively, when I run the same code on my local machine it works fine. How do I resolve this error ?