I have trained an LSTM RNN classification model on Tensorflow. I was saving and restoring checkpoints to retrain and use the model for testing. Now I want to use Tensorflow serving so that I can use the model in production.
Initially, I would parse through a corpus to create my dictionary which is then used to map words in a string to integers. I would then store this dictionary in a pickle file which could be reloaded when restoring a checkpoint and retraining on a data set or just for using the model so that the mapping is consistent. How do I store this dictionary when saving the model using SavedModelBuilder?
My code for the neural network is as follows. The code for saving the model is towards the end (I am including an overview of the whole structure for context):
...
# Read files and store them in variables
with open('./someReview.txt', 'r') as f:
reviews = f.read()
with open('./someLabels.txt', 'r') as f:
labels = f.read()
...
#Pre-processing functions
#Parse through dataset and create a vocabulary
vocab_to_int, reviews = RnnPreprocessing.map_vocab_to_int(reviews)
with open(pickle_path, 'wb') as handle:
pickle.dump(vocab_to_int, handle, protocol=pickle.HIGHEST_PROTOCOL)
#More preprocessing functions
...
# Building the graph
lstm_size = 256
lstm_layers = 2
batch_size = 1000
learning_rate = 0.01
n_words = len(vocab_to_int) + 1
# Create the graph object
tf.reset_default_graph()
with tf.name_scope('inputs'):
inputs_ = tf.placeholder(tf.int32, [None, None], name="inputs")
labels_ = tf.placeholder(tf.int32, [None, None], name="labels")
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
#Create embedding layer LSTM cell, LSTM Layers
...
# Forward pass
with tf.name_scope("RNN_forward"):
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)
# Output. We are only interested in the latest output of the lstm cell
with tf.name_scope('predictions'):
predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
tf.summary.histogram('predictions', predictions)
#More functions for cost, accuracy, optimizer initialization
...
# Training
epochs = 1
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
iteration = 1
for e in range(epochs):
state = sess.run(initial_state)
for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 0.5,
initial_state: state}
summary, loss, state, _ = sess.run([merged, cost, final_state, optimizer], feed_dict=feed)
train_writer.add_summary(summary, iteration)
if iteration%1==0:
print("Epoch: {}/{}".format(e, epochs),
"Iteration: {}".format(iteration),
"Train loss: {:.3f}".format(loss))
if iteration%2==0:
val_acc = []
val_state = sess.run(cell.zero_state(batch_size, tf.float32))
for x, y in get_batches(val_x, val_y, batch_size):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: val_state}
summary, batch_acc, val_state = sess.run([merged, accuracy, final_state], feed_dict=feed)
val_acc.append(batch_acc)
print("Val acc: {:.3f}".format(np.mean(val_acc)))
iteration +=1
test_writer.add_summary(summary, iteration)
#Saving the model
export_path = './SavedModel'
print ('Exporting trained model to %s'%(export_path))
builder = saved_model_builder.SavedModelBuilder(export_path)
# Build the signature_def_map.
classification_inputs = utils.build_tensor_info(inputs_)
classification_outputs_classes = utils.build_tensor_info(labels_)
classification_signature = signature_def_utils.build_signature_def(
inputs={signature_constants.CLASSIFY_INPUTS: classification_inputs},
outputs={
signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes,
},
method_name=signature_constants.CLASSIFY_METHOD_NAME)
legacy_init_op = tf.group(
tf.tables_initializer(), name='legacy_init_op')
#add the sigs to the servable
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature
},
legacy_init_op=legacy_init_op)
print ("added meta graph and variables")
#save it!
builder.save()
print("model saved")
I am not entirely sure if this is the correct way to save a model such as this but this is the only implementation I have found in the documentation and online tutorials.
I haven't found any example or any explicit guide to saving the dictionary or how to use it when restoring a savedModel in the documentation.
When using checkpoints, I would just load the pickle file before running the session. How do I restore this savedModel so that I can use the same word to int mapping using the dictionary? Is there any specific way I should be saving the model or loading it?
I have also added inputs_ as the input for the input signature. This is a sequence of integeres 'after' the words have been mapped. I can't specify a string as input because I get an AttributeError: 'str' object has no attribute 'dtype'
. In such cases, how exactly are words mapped to integers in models that are in production?