1

I have trained my pycaret model locally and then later on I pushed it to S3. Now I want to run a predict_model() method on my bigger production datasets.

Using boto3 I copy my model pickle file from S3 to a master node of my Spark EMR cluster. Then I import the library using

from pycaret.classification import *

And try to apply my predictions as below -

model_path = '/tmp/catboost_model_aug19'
saved_model = load_model(model_path)  
Transformation Pipeline and Model Successfully Loaded
new_data = spark.sql("select * from table").toPandas()
df = predict_model(saved_model, data = new_data)

When I run the predict_model() it errors out saying Pipeline not found

Alternatively, when I run the same code on my local machine it works fine. How do I resolve this error ?

Regressor
  • 1,843
  • 4
  • 27
  • 67

1 Answers1

1

What version of Pycaret was used for model creation? I faced similar error and it turned out that the pickled model was built on previous version of pycaret while I had the latest version.

Shash
  • 51
  • 4
  • yes its the same for me. I had trained my model in 2.0 and the latest is 2.1 .. I ended up installing 2.0 again using the `.whl` – Regressor Sep 24 '20 at 14:06