I have used Recommenders https://github.com/microsoft/recommenders
library to train an NCF recommendation model. Currently I'm getting issues in deployment through Amazon TensorflowModel library
Model is saved using the following code
def save(self, dir_name):
"""Save model parameters in `dir_name`
Args:
dir_name (str): directory name, which should be a folder name instead of file name
we will create a new directory if not existing.
"""
# save trained model
if not os.path.exists(dir_name):
os.makedirs(dir_name)
saver = tf.compat.v1.train.Saver()
saver.save(self.sess, os.path.join(dir_name, MODEL_CHECKPOINT))
Files exported in the process are 'checkpoint', 'model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta'
They follow the structure of
- model.tar.gz
- 00000000
- checkpoint
- model.ckpt.data-00000-of-00001
- model.ckpt.index
- model.ckpt.meta
I have tried various deployment processes, however they all give the same error. Here's the latest one that I implemented following this example https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/pytorch_bert/code/inference_code.py
from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(
entry_point="tf_inference.py",
model_data=zipped_model_path,
role=role,
model_version='1',
framework_version="2.7"
)
predictor = model.deploy(
initial_instance_count=1, instance_type="ml.g4dn.2xlarge", endpoint_name='endpoint-name3'
)
All Solutions end with the same error over and over again
Traceback (most recent call last):
File "/sagemaker/serve.py", line 502, in <module>
ServiceManager().start()
File "/sagemaker/serve.py", line 482, in start
self._create_tfs_config()
File "/sagemaker/serve.py", line 153, in _create_tfs_config
raise ValueError("no SavedModel bundles found!")