4

I'm learning text classification using movie reviews as data with tensorflow, but I got stuck when I get an output prediction different (not rounded, not binary) to the label.

CODE

predict = model.predict([test_review])

print("Prediction: " + str(predict[0])) # [1.8203685e-19] 
print("Actual: " + str(test_labels[0])) # 0

The expected ouput should be:

Prediction: [0.]
Actual: 0

What the output is giving:

Prediction: [1.8203685e-19]
Actual: 0

The output prediction should be 0 or 1, representing if the review was good or not.

FULL CODE

import tensorflow as tf
from tensorflow import keras
import numpy as np

data = keras.datasets.imdb

(train_data, train_labels), (test_data, test_labels) = data.load_data(num_words = 10000)

word_index = data.get_word_index()
word_index = {k:(v + 3) for k, v in word_index.items()} 

word_index['<PAD>'] = 0
word_index['<START>'] = 1 
word_index['<UNK>'] = 2
word_index['<UNUSED>'] = 3

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

train_data = keras.preprocessing.sequence.pad_sequences(train_data, value = word_index['<PAD>'], padding = 'post', maxlen = 256)
test_data = keras.preprocessing.sequence.pad_sequences(test_data, value = word_index['<PAD>'], padding = 'post', maxlen = 256)

def decode_review(text):
    """ decode the training and testing data into readable words"""
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

print("\n")
print(decode_review(test_data[0]))

model = keras.Sequential()
model.add(keras.layers.Embedding(10000, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation = 'relu'))
model.add(keras.layers.Dense(1, activation = 'sigmoid'))
model.summary()

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) 

x_val = train_data[:10000]
x_train = train_data[10000:]

y_val = train_labels[:10000]
y_train = train_labels[10000:]

fitModel = model.fit(x_train, y_train, epochs = 40,
                     batch_size = 512, 
                     validation_data = (x_val, y_val),
                     verbose = 1)

results = model.evaluate(test_data, test_labels)

test_review = test_data[0]
predict = model.predict([test_review])
print("Review: ")
print(decode_review(test_review))
print("Prediction: " + str(predict[0])) # [1.8203685e-19] 
print("Actual: " + str(test_labels[0]))
print("\n[loss, accuracy]: ", results)
Y4RD13
  • 937
  • 1
  • 16
  • 42
  • just convert values smaller than 0.5 to 0 and otherwise 1. It is just giving you the probability of being 1 (as far as I understand it) – Carles Jan 28 '20 at 10:27

1 Answers1

6

Replace the predict method with predict_classes method:

model.predict_classes([test_review])
Kevin
  • 16,549
  • 8
  • 60
  • 74
manoj yadav
  • 347
  • 2
  • 7
  • 8
    UserWarning: `model.predict_classes()` is deprecated and will be removed after 2021-01-01. Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation). – Dmitry Yudakov Apr 29 '21 at 18:14