I've been going crazy for a few days over a problem that I thought trivial. My end-goal is to deploy to AWS Sagemaker a Tensorflow model that uses a simple string as input, calculates the embedding using a 'sentence-transformer' pre-trained model and eventually uses TensorFlow Recommenders to suggest the knn among a collection of embedding I already have calculated. I would like to do this entirely from the model, including the preprocessing (tokenization).
I made the predictions works with different approaches in my notebook. I start having troubles when I try to save my model.
The problem seems to be that HF's AutoTokenizer needs a pure List of Strings
as input, and I hit a roadblock whenever I try to save my model using , and trying to go around this with tf.py_function using this approach results in problems with Sagemaker.
My approaches so far:
1. THE 'I THOUGHT IT WAS SO SIMPLE'
startups_ids: list, startup_vectors
):
import tensorflow as tf
import tensorflow_recommenders as tfrs
import numpy as np
from random import randint
exported_model = tfrs.layers.factorized_top_k.BruteForce(SentenceTransformer("all-mpnet-base-v2").encode)
exported_model.index(np.array(startup_vectors), np.array(startups_ids))
# TESTS the model
#for some reason this seems to be needed in order to save the model :/
# https://github.com/tensorflow/recommenders/issues/131
test = exported_model(['Test Text Query'])
print(test)
return exported_model
text_to_startup_model(search_db_ids, search_db_embeddings)
#--> WORKS PERFECTLY, AS I GET SOME SUGGESTIONS
tf.saved_model.save(text_to_startup_model(search_db_ids, search_db_embeddings), export_dir="/home/nicholas/test_model_save/1")
#TypeError Traceback (most recent call last)
# /home/nicholas/Documents/Dev/Rialto-predict-1/notebooks/t2s_different_approaches.ipynb Cell 5 in <cell line: 22>()
# 19 text_to_startup_model(search_db_ids, search_db_embeddings)
# 20 #--> WORKS PERFECTLY, AS I GET SOME SUGGESTIONS
# ---> 22 tf.saved_model.save(text_to_startup_model(search_db_ids, search_db_embeddings), export_dir="/home/nicholas/test_model_save/1")
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/saved_model/save.py:1334, in save(obj, export_dir, signatures, options)
# 1332 # pylint: enable=line-too-long
# 1333 metrics.IncrementWriteApi(_SAVE_V2_LABEL)
# -> 1334 save_and_return_nodes(obj, export_dir, signatures, options)
# 1335 metrics.IncrementWrite(write_version="2")
#
# .........
#
#
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/eager/def_function.py:677, in Function._defun_with_scope.<locals>.wrapped_fn(*args, **kwds)
# 673 with default_graph._variable_creator_scope(scope, priority=50): # pylint: disable=protected-access
# 674 # __wrapped__ allows AutoGraph to swap in a converted function. We give
# 675 # the function a weak reference to itself to avoid a reference cycle.
# 676 with OptionalXlaContext(compile_with_xla):
# --> 677 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
# 678 return out
# File ~/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow/python/framework/func_graph.py:1147, in func_graph_from_py_func.<locals>.autograph_handler(*args, **kwargs)
# 1145 except Exception as e: # pylint:disable=broad-except
# 1146 if hasattr(e, "ag_error_metadata"):
# -> 1147 raise e.ag_error_metadata.to_exception(e)
# 1148 else:
# 1149 raise
# TypeError: in user code:
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/keras/saving/saving_utils.py", line 138, in _wrapped_model *
# outputs = model(*args, **kwargs)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
# raise e.with_traceback(filtered_tb) from None
# TypeError: Exception encountered when calling layer "brute_force_3" (type BruteForce).
# in user code:
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/tensorflow_recommenders/layers/factorized_top_k.py", line 567, in call *
# queries = self.query_model(queries)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 160, in encode *
# features = self.tokenize(sentences_batch)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 318, in tokenize *
# return self._first_module().tokenize(texts)
# File "/home/nicholas/Documents/Dev/Rialto-predict-1/venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 102, in tokenize *
# batch1.append(text_tuple[0])
# TypeError: 'NoneType' object is not subscriptable
# ...
# Call arguments received:
# • queries=['None']
# • k=None
2. THE tf.py_function As from my understanding the problem with the first approach is that it has no knowledge of the input type/value this second approach, from Use `sentence-transformers` inside of a keras model was supposedly gonna work, as it uses tf.py_function to accept a List of Strings as first input, without complaining.
def approach_2(startups_ids: list, startup_vectors):
import tensorflow as tf
import tensorflow_recommenders as tfrs
import numpy as np
from transformers import MPNetTokenizer, TFMPNetModel
# Here it loads the specific pre-trained model we are using for Rialto
tokenizer = MPNetTokenizer.from_pretrained(
"sentence-transformers/all-mpnet-base-v2"
)
model = TFMPNetModel.from_pretrained(
"sentence-transformers/all-mpnet-base-v2", from_pt=True
)
class SBert(tf.keras.layers.Layer):
def __init__(self, tokenizer, model):
super(SBert, self).__init__()
self.tokenizer = tokenizer
self.model = model
def tf_encode(self, inputs):
def encode(inputs):
inputs = [x[0].decode("utf-8") for x in inputs.numpy()]
outputs = self.tokenizer(
inputs, padding=True, truncation=True, return_tensors="tf"
)
return outputs["input_ids"], outputs["attention_mask"]
return tf.py_function(
func=encode, inp=[inputs], Tout=[tf.int32, tf.int32]
)
def process(self, i, a):
def __call(i, a):
model_output = self.model(
{"input_ids": i.numpy(), "attention_mask": a.numpy()}
)
return model_output[0]
return tf.py_function(func=__call, inp=[i, a], Tout=[tf.float32])
def mean_pooling(self, model_output, attention_mask):
token_embeddings = tf.squeeze(tf.stack(model_output), axis=0)
input_mask_expanded = tf.cast(
tf.broadcast_to(
tf.expand_dims(attention_mask, -1), tf.shape(token_embeddings)
),
tf.float32,
)
a = tf.math.reduce_sum(token_embeddings * input_mask_expanded, axis=1)
b = tf.clip_by_value(
tf.math.reduce_sum(input_mask_expanded, axis=1),
1e-9,
tf.float32.max,
)
embeddings = a / b
embeddings, _ = tf.linalg.normalize(embeddings, 2, axis=1)
return embeddings
def call(self, inputs):
input_ids, attention_mask = self.tf_encode(inputs)
model_output = self.process(input_ids, attention_mask)
embeddings = self.mean_pooling(model_output, attention_mask)
return embeddings
# Uses the keras-ified model in a Keras model
sbert = SBert(tokenizer, model)
inputs = tf.keras.layers.Input((1,), dtype=tf.string)
outputs = sbert(inputs)
model = tf.keras.Model(inputs, outputs)
# Implements the model we just build for top KNN retrieval, from the pool of pre-calculated startups embeddings.
exported_model = tfrs.layers.factorized_top_k.BruteForce(model)
exported_model.index(np.array(startup_vectors), np.array(startups_ids))
# TESTS the model
# for some reason this seems to be needed in order to save the model :/
# https://github.com/tensorflow/recommenders/issues/131
print(exported_model(tf.constant(["'Test Text Query'"])))
return exported_model
model_to_store_1 = approach_2(search_db_ids, search_db_embeddings)
tf.saved_model.save(model_to_store_1, export_dir="/home/nicholas/test_model_save/2")
# THIS ONE WORKS LIKE A CHARM, saving the model and everything. Deploy on sagemaker is successful.
# FAILS TO WORK ON SAGEMAKER. BELOW THE LOGS WHEN THE MODEL IS CALLED
# ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
# "error": "No OpKernel was registered to support Op 'EagerPyFunc' used by {{node StatefulPartitionedCall/brute_force/model/s_bert/EagerPyFunc}} with these attrs: [is_async=false, Tin=[DT_STRING], _output_shapes=[<unknown>, <unknown>], Tout=[DT_INT32, DT_INT32], token=\"pyfunc_4\"]\nRegistered devices: [CPU]\nRegistered kernels:\n <no registered kernels>\n\n\t [[StatefulPartitionedCall/brute_force/model/s_bert/EagerPyFunc]]\n\t [[StatefulPartitionedCall]]"
# }". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/rialto-t2s-model-endpoint in account 634470116418 for more information
As you can see from the log, that the problem seems to be with the Eager mode and py_functions. I tried to google and found absolutely nothing on how to address this issue.
3. THE Classes approach
I've tried implementing something building upon this article, but I am running into similar issues that with the first approach, as when I go to save the model, the expected input clashed with the requirements of tokenizer.
EDIT 1 - here a coolab showcasing the approach: https://colab.research.google.com/drive/1gibFdEoHTs0hzD5yiXzLT_-asmilUoAQ?usp=sharing#scrollTo=TibAssWm3D5e
All of this journey triggered some questions:
Question 1 Is this even a best practice? Should I serve my model the tokenized sentences as a tensor?
Question 2 How the hell do I make it work? :)