Use `sentence-transformers` inside of a Tensorflow-recommendation keras model in SageMaker

Question

I've been going crazy for a few days over a problem that I thought trivial. My end-goal is to deploy to AWS Sagemaker a Tensorflow model that uses a simple string as input, calculates the embedding using a 'sentence-transformer' pre-trained model and eventually uses TensorFlow Recommenders to suggest the knn among a collection of embedding I already have calculated. I would like to do this entirely from the model, including the preprocessing (tokenization).

I made the predictions works with different approaches in my notebook. I start having troubles when I try to save my model.

The problem seems to be that HF's AutoTokenizer needs a pure List of Strings as input, and I hit a roadblock whenever I try to save my model using , and trying to go around this with tf.py_function using this approach results in problems with Sagemaker.

My approaches so far:

1. THE 'I THOUGHT IT WAS SO SIMPLE'

   startups_ids: list, startup_vectors
):
   import tensorflow as tf
   import tensorflow_recommenders as tfrs
   import numpy as np
   from random import randint
    
   exported_model = tfrs.layers.factorized_top_k.BruteForce(SentenceTransformer("all-mpnet-base-v2").encode)
   exported_model.index(np.array(startup_vectors), np.array(startups_ids))
   
   # TESTS the model
   #for some reason this seems to be needed in order to save the model :/ 
   # https://github.com/tensorflow/recommenders/issues/131

   test = exported_model(['Test Text Query'])

   print(test)
   
   return exported_model


text_to_startup_model(search_db_ids, search_db_embeddings)
#--> WORKS PERFECTLY, AS I GET SOME SUGGESTIONS

tf.saved_model.save(text_to_startup_model(search_db_ids, search_db_embeddings), export_dir="/home/nicholas/test_model_save/1")

#TypeError                                 Traceback (most recent call last)
# /home/nicholas/Documents/Dev/Rialto-predict-1/notebooks/t2s_different_approaches.ipynb Cell 5 in <cell line: 22>()
#      19 text_to_startup_model(search_db_ids, search_db_embeddings)
#      20 #--> WORKS PERFECTLY, AS I GET SOME SUGGESTIONS
# ---> 22 tf.saved_model.save(text_to_startup_model(search_db_ids, search_db_embeddings), export_dir="/home/nicholas/test_model_save/1")

# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/saved_model/save.py:1334, in save(obj, export_dir, signatures, options)
#    1332 # pylint: enable=line-too-long
#    1333 metrics.IncrementWriteApi(_SAVE_V2_LABEL)
# -> 1334 save_and_return_nodes(obj, export_dir, signatures, options)
#    1335 metrics.IncrementWrite(write_version="2")
# 
# .........
# 
# 
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/eager/def_function.py:677, in Function._defun_with_scope.<locals>.wrapped_fn(*args, **kwds)
#     673 with default_graph._variable_creator_scope(scope, priority=50):  # pylint: disable=protected-access
#     674   # __wrapped__ allows AutoGraph to swap in a converted function. We give
#     675   # the function a weak reference to itself to avoid a reference cycle.
#     676   with OptionalXlaContext(compile_with_xla):
# --> 677     out = weak_wrapped_fn().__wrapped__(*args, **kwds)
#     678   return out

# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/framework/func_graph.py:1147, in func_graph_from_py_func.<locals>.autograph_handler(*args, **kwargs)
#    1145 except Exception as e:  # pylint:disable=broad-except
#    1146   if hasattr(e, "ag_error_metadata"):
# -> 1147     raise e.ag_error_metadata.to_exception(e)
#    1148   else:
#    1149     raise

# TypeError: in user code:

#     File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/keras/saving/saving_utils.py", line 138, in _wrapped_model  *
#         outputs = model(*args, **kwargs)
#     File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
#         raise e.with_traceback(filtered_tb) from None

#     TypeError: Exception encountered when calling layer "brute_force_3" (type BruteForce).
    
#     in user code:
    
#         File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow_recommenders/layers/factorized_top_k.py", line 567, in call  *
#             queries = self.query_model(queries)
#         File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 160, in encode  *
#             features = self.tokenize(sentences_batch)
#         File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 318, in tokenize  *
#             return self._first_module().tokenize(texts)
#         File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 102, in tokenize  *
#             batch1.append(text_tuple[0])
    
#         TypeError: 'NoneType' object is not subscriptable
    
# ...
    
#     Call arguments received:
#       • queries=['None']
#       • k=None

2. THE tf.py_function As from my understanding the problem with the first approach is that it has no knowledge of the input type/value this second approach, from Use `sentence-transformers` inside of a keras model was supposedly gonna work, as it uses tf.py_function to accept a List of Strings as first input, without complaining.

def approach_2(startups_ids: list, startup_vectors):
        import tensorflow as tf
        import tensorflow_recommenders as tfrs
        import numpy as np
        from transformers import MPNetTokenizer, TFMPNetModel

        # Here it loads the specific pre-trained model we are using for Rialto
        tokenizer = MPNetTokenizer.from_pretrained(
            "sentence-transformers/all-mpnet-base-v2"
        )
        model = TFMPNetModel.from_pretrained(
            "sentence-transformers/all-mpnet-base-v2", from_pt=True
        )

        class SBert(tf.keras.layers.Layer):
            def __init__(self, tokenizer, model):
                super(SBert, self).__init__()

                self.tokenizer = tokenizer
                self.model = model

            def tf_encode(self, inputs):
                def encode(inputs):
                    inputs = [x[0].decode("utf-8") for x in inputs.numpy()]
                    outputs = self.tokenizer(
                        inputs, padding=True, truncation=True, return_tensors="tf"
                    )
                    return outputs["input_ids"], outputs["attention_mask"]

                return tf.py_function(
                    func=encode, inp=[inputs], Tout=[tf.int32, tf.int32]
                )

            def process(self, i, a):
                def __call(i, a):
                    model_output = self.model(
                        {"input_ids": i.numpy(), "attention_mask": a.numpy()}
                    )
                    return model_output[0]

                return tf.py_function(func=__call, inp=[i, a], Tout=[tf.float32])

            def mean_pooling(self, model_output, attention_mask):

                token_embeddings = tf.squeeze(tf.stack(model_output), axis=0)
                input_mask_expanded = tf.cast(
                    tf.broadcast_to(
                        tf.expand_dims(attention_mask, -1), tf.shape(token_embeddings)
                    ),
                    tf.float32,
                )
                a = tf.math.reduce_sum(token_embeddings * input_mask_expanded, axis=1)
                b = tf.clip_by_value(
                    tf.math.reduce_sum(input_mask_expanded, axis=1),
                    1e-9,
                    tf.float32.max,
                )
                embeddings = a / b
                embeddings, _ = tf.linalg.normalize(embeddings, 2, axis=1)

                return embeddings

            def call(self, inputs):
                input_ids, attention_mask = self.tf_encode(inputs)
                model_output = self.process(input_ids, attention_mask)
                embeddings = self.mean_pooling(model_output, attention_mask)
                return embeddings

        #  Uses the keras-ified model in a Keras model
        sbert = SBert(tokenizer, model)
        inputs = tf.keras.layers.Input((1,), dtype=tf.string)
        outputs = sbert(inputs)
        model = tf.keras.Model(inputs, outputs)

        # Implements the model we just build for top KNN retrieval, from the pool of pre-calculated startups embeddings.
        exported_model = tfrs.layers.factorized_top_k.BruteForce(model)
        exported_model.index(np.array(startup_vectors), np.array(startups_ids))

        # TESTS the model
        # for some reason this seems to be needed in order to save the model :/
        # https://github.com/tensorflow/recommenders/issues/131

        print(exported_model(tf.constant(["'Test Text Query'"])))

        return exported_model


model_to_store_1 = approach_2(search_db_ids, search_db_embeddings)

tf.saved_model.save(model_to_store_1, export_dir="/home/nicholas/test_model_save/2")

# THIS ONE WORKS LIKE A CHARM, saving the model and everything. Deploy on sagemaker is successful.
 
# FAILS TO WORK ON SAGEMAKER. BELOW THE LOGS WHEN THE MODEL IS CALLED

# ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
#     "error": "No OpKernel was registered to support Op 'EagerPyFunc' used by {{node StatefulPartitionedCall/brute_force/model/s_bert/EagerPyFunc}} with these attrs: [is_async=false, Tin=[DT_STRING], _output_shapes=[<unknown>, <unknown>], Tout=[DT_INT32, DT_INT32], token=\"pyfunc_4\"]\nRegistered devices: [CPU]\nRegistered kernels:\n  <no registered kernels>\n\n\t [[StatefulPartitionedCall/brute_force/model/s_bert/EagerPyFunc]]\n\t [[StatefulPartitionedCall]]"
# }". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/rialto-t2s-model-endpoint in account 634470116418 for more information

As you can see from the log, that the problem seems to be with the Eager mode and py_functions. I tried to google and found absolutely nothing on how to address this issue.

3. THE Classes approach

I've tried implementing something building upon this article, but I am running into similar issues that with the first approach, as when I go to save the model, the expected input clashed with the requirements of tokenizer.

EDIT 1 - here a coolab showcasing the approach: https://colab.research.google.com/drive/1gibFdEoHTs0hzD5yiXzLT_-asmilUoAQ?usp=sharing#scrollTo=TibAssWm3D5e

All of this journey triggered some questions:

Question 1 Is this even a best practice? Should I serve my model the tokenized sentences as a tensor?

Question 2 How the hell do I make it work? :)

This question appears to be more TF related than SageMaker? Are you able to save and load up your model with vanilla TF ? — Marc Karp, Sep 09 '22 at 18:25
Indeed you are right, I mentioned SageMaker because I need to run the saved model there. I didn't manage to save the first approach, nor the 3rd. When I try to save I get different shaded of ```input=['None']``` and tokenizer complaining about needing a List of Strings. With the second approach I can save and re-run the model, but Sagemaker is apparently complaining about tf.py_function — Nicholas Sartor, Sep 12 '22 at 07:05
I just added a colab notebook showcasing the 3rd approach. This won't even save, because of the aforementioned problem. I'm starting to think the only way of doing it is to remove the tokenizer from the model. A bit of a pity though. — Nicholas Sartor, Sep 12 '22 at 07:13

score 0 · Answer 1 · answered Apr 27 '23 at 04:06

posted my solution to help those may reach here:

class TestModel(tf.keras.Model):
def __init__(self, index_layer):
    super(TestModel, self).__init__()
    self.index_layer = index_layer   
    
@tf.function(input_signature=[{"message": tf.TensorSpec(shape=(), dtype=tf.string, name="message")}])
def call(self, inputs):
    input_message = [inputs['message']]
    input_text = tf.strings.join(
        input_message,
        separator=" ",
        name="input_1"
    )
    input_text = tf.reshape(input_text, [1])
    print(input_text)
    output = self.index_layer(input_text)
    
    return output
test_model = TestModel(model_to_store_1)
req = {"message": "test"}
test_model(req)
tf.saved_model.save(test_model, "models/")
loaded = tf.saved_model.load("models/")
loaded(req)

To summarise, I wrap the model from @Nicholas 'approach_2' into a new model by specifying input_signatures, problem with approach_2 that cannot accept string input is that in the model definition, the input is supposed to be tensor dtype.

Hope this help!

Use `sentence-transformers` inside of a Tensorflow-recommendation keras model in SageMaker

1 Answers1