Keras and sentiment analysis prediction

Question

Good morning,

I trained a LSTM network on yelp https://www.yelp.com/dataset restaurants data set. It is a large dataset and it took several days to train on my PC. Anyways I saved the model and weights and now wish to use it for predictions for real time sentiment evaluations.

What is the common / good / best practice to do this: I load the model and the weights, I then compile it. This is not an issue there are plenty examples in the documentation or on the Internet. However what next? All I need to do is to tokenize the newly received review then pad it and pass to the model.predict?

tokenizer = Tokenizer(num_words = 2500, split=' ')
tokenizer.fit_on_texts(data['text'].values)
print(tokenizer.word_index)  
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X)

Cannot be that simple… If it is all what is required then how this is connected with the tokenizer that was used to train the model? It was an expensive operation to tokenize more than 2.5 milion reviews downloaded originally from yelp dataset?

Thank you for any suggestions.

score 2 · Answer 1 · answered Jul 29 '18 at 00:06

2

You will want to save the Tokenizer and reuse it at inference time to make sure that your test sentence is decomposed into the correct integers. See this answer for an example on how to do this.

answered Jul 29 '18 at 00:06

sdcbr

7,021
3
27
44

score 2 · Answer 2 · answered Jul 31 '18 at 21:43

Yes, thank you worked perfectly. Just for completness of this thread:

I saved / loaded the tokenizer using:

import pickle

def save_tokenizer(file_path, tokenizer):
    with open(file_path, 'wb') as handle:
        pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

def load_tokenizer(file_path):
    with open(file_path, 'rb') as handle:
        tokenizer = pickle.load(handle)
    return tokenizer

Then used the tokenizer for predictions:

tokenizer = u.load_tokenizer("SavedModels/tokenizer.pcl")

X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X, maxlen = maxLength)
print(X)

model = u.load_model_from_prefix("single layer")
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])

prediction = model.predict(X)

print(prediction)
print(np.argmax(prediction))

Thanks for your help.

Keras and sentiment analysis prediction

Good morning,

2 Answers2