2

I am trying to create a script which loads a saved CatBoost model from a Cloud Storage bucket, and use it to make predictions. However, I am unable to successfully load the file. CatBoost throws an error that the model file does not exist, though I've copied the path directly from the UI.

I am using Google Cloud Platform. The script is in an AI Platform JupyterLab notebook in the same project as the bucket in which the model is stored. The feature set I am using to make the predictions is stored in the same bucket as the model, and I am able to read the feature set file into a dataframe (X_eval) successfully.

I have tried using both the URI ("gs://...") and the authenticated URL ("https://..."), and both throw the same error.

#Specify model path
path = 'gs://bucket_id/model-name'

# Load model
from_file = CatBoostClassifier()
model = from_file.load_model(path)

model.predict(X_eval)
---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
<ipython-input-9-f7a6068f5718> in <module>
     70 
     71 if __name__ == "__main__":
---> 72     main('data','context')

<ipython-input-9-f7a6068f5718> in main(data, context)
     42     # Load model
     43     from_file = CatBoostClassifier()
---> 44     from_file.load_model(path)
     45 
     46     model.predict(X_eval)

/opt/conda/lib/python3.7/site-packages/catboost/core.py in load_model(self, fname, format, stream, blob)
   2655 
   2656         if fname is not None:
-> 2657             self._load_model(fname, format)
   2658         elif stream is not None:
   2659             self._load_from_stream(stream)

/opt/conda/lib/python3.7/site-packages/catboost/core.py in _load_model(self, model_file, format)
   1345             raise CatBoostError("Invalid fname type={}: must be str().".format(type(model_file)))
   1346 
-> 1347         self._object._load_model(model_file, format)
   1348         self._set_trained_model_attributes()
   1349         for key, value in iteritems(self._get_params()):

_catboost.pyx in _catboost._CatBoost._load_model()

_catboost.pyx in _catboost._CatBoost._load_model()

CatBoostError: catboost/libs/model/model_import_interface.h:19: Model file doesn't exist: gs://bucket_id/model-name

If I upload the same model file to the local file system (e.g., the file system of the VM on which the JupyterLabs notebook is running), the model loads successfully. For example, this works:

#Specify model path
path = 'model-name'

# Load model
from_file = CatBoostClassifier()
model = from_file.load_model(path)

model.predict(X_eval)
K. Thorspear
  • 473
  • 3
  • 12

2 Answers2

3

There is a nicer way of doing this - seemingly undocumented...

import catboost as cb
from google.cloud import storage

storage_client = storage.Client()

bucket_name = "catboost-models" # put your bucket name here
blob_name = "mymodel" # put the blob name from the bucket here

blob = storage_client.bucket( bucket_name ).blob( blob_name ).download_as_bytes()

model = cb.CatBoostClassifier()
model.load_model( blob = blob )
1

I used Ture Friese's answer to the following question to solve this problem: How to load a model saved in joblib file from Google Cloud Storage bucket

This involved using BytesIO to download the file into an in-memory file object, then loading the model from that file object, and using it to make predictions on the dataframe X_eval:

from io import BytesIO

storage_client = storage.Client()

# Storage variables
model_bucket_id = #Replace with your bucket ID
model_bucket = storage_client.get_bucket(model_bucket_id)
model_name = #Replace with the file name of the model

# Select bucket file
blob = model_bucket.blob(model_name)

# Download blob into an in-memory file object
model_file = BytesIO()
blob.download_to_filename(model_file)

# Load model from in-memory file object
from_file = CatBoostClassifier()
model = from_file.load_model(model_name)

model.predict(X_eval)
K. Thorspear
  • 473
  • 3
  • 12