1

I have used Recommenders https://github.com/microsoft/recommenders library to train an NCF recommendation model. Currently I'm getting issues in deployment through Amazon TensorflowModel library

Model is saved using the following code

    def save(self, dir_name):
    """Save model parameters in `dir_name`
    Args:
        dir_name (str): directory name, which should be a folder name instead of file name
            we will create a new directory if not existing.
    """
    # save trained model
    if not os.path.exists(dir_name):
        os.makedirs(dir_name)
    saver = tf.compat.v1.train.Saver()
    saver.save(self.sess, os.path.join(dir_name, MODEL_CHECKPOINT))

Files exported in the process are 'checkpoint', 'model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta' They follow the structure of

- model.tar.gz
    - 00000000
        - checkpoint
        - model.ckpt.data-00000-of-00001
        - model.ckpt.index
        - model.ckpt.meta

I have tried various deployment processes, however they all give the same error. Here's the latest one that I implemented following this example https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/pytorch_bert/code/inference_code.py

from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(
    entry_point="tf_inference.py",
    model_data=zipped_model_path,
    role=role,
    model_version='1',
    framework_version="2.7"
)

predictor = model.deploy(
    initial_instance_count=1, instance_type="ml.g4dn.2xlarge", endpoint_name='endpoint-name3'
)

All Solutions end with the same error over and over again

Traceback (most recent call last):
  File "/sagemaker/serve.py", line 502, in <module>
    ServiceManager().start()
  File "/sagemaker/serve.py", line 482, in start
    self._create_tfs_config()
  File "/sagemaker/serve.py", line 153, in _create_tfs_config
    raise ValueError("no SavedModel bundles found!")

1 Answers1

0

These 2 links helped me resolve the issue

  1. https://github.com/aws/sagemaker-python-sdk/issues/599
  2. https://www.tensorflow.org/guide/migrate/saved_model#1_save_the_graph_as_a_savedmodel_with_savedmodelbuilder

Sagemaker has weird directory structure that you need to strictly follow. The first one shares the starting directories and 2nd one shares the process of saving the model for TF1 and TF2