0

I am seeing 782 instead of 25000 when fiting a model. is it ok?

Hello every one, I am new in tensorflow, trying to fit a model on IMDB dataset in tensorflow. this is how i loaded my data:

import tensorflow_datasets as tfds
import tensorflow as tf
import numpy as np
imdb ,info = tfds.load("imdb_reviews",with_info=True,as_supervised=True)
(train_data , test_data )= (imdb["train"],imdb["test"])
training_sentences = []
training_labels = []
testing_sentences = []
testing_labels = []
for s,l in train_data:
    training_sentences.append(str(s.numpy()))
    training_labels.append(l.numpy())
for s,l in test_data:
    testing_sentences.append(str(s.numpy()))
    testing_labels.append(l.numpy())
training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)
vocab_size = 10000
emmbeding_dim = 16
max_length = 120
trunc_type = "post"
oov_tok = "<OOV>"
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(num_words= vocab_size,oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index
sequence = tokenizer.texts_to_sequences(training_sentences)
paded = pad_sequences(sequence,maxlen=max_length,truncating=trunc_type)
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences,maxlen=max_length, truncating=trunc_type)

and its my model using Flatten:

my_model_with_flatten = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,emmbeding_dim,input_length=max_length),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=6,activation='leaky_relu'),
    tf.keras.layers.Dense(units=1,activation='sigmoid'),
]) 
my_model_with_flatten.compile(optimizer='adam'
                              ,loss='binary_crossentropy',metrics=["accuracy"])
flaten_history=my_model_with_flatten.fit(x=paded,y=training_labels_final,epochs=10,
                                         validation_data=[testing_padded,testing_labels_final])

Outpot:

Epoch 1/10
782/782 [==============================] - 6s 7ms/step - loss: 0.4893 - accuracy: 0.7504 - val_loss: 0.3950 - val_accuracy: 0.8175
Epoch 2/10
782/782 [==============================] - 4s 6ms/step - loss: 0.2317 - accuracy: 0.9140 - val_loss: 0.4159 - val_accuracy: 0.8166
Epoch 3/10
782/782 [==============================] - 5s 6ms/step - loss: 0.0799 - accuracy: 0.9814 - val_loss: 0.5299 - val_accuracy: 0.8070
Epoch 4/10
782/782 [==============================] - 4s 6ms/step - loss: 0.0198 - accuracy: 0.9978 - val_loss: 0.6216 - val_accuracy: 0.8051
Epoch 5/10
782/782 [==============================] - 4s 6ms/step - loss: 0.0063 - accuracy: 0.9995 - val_loss: 0.6861 - val_accuracy: 0.8022
Epoch 6/10
782/782 [==============================] - 4s 6ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 0.7462 - val_accuracy: 0.8046
Epoch 7/10
782/782 [==============================] - 4s 6ms/step - loss: 7.7573e-04 - accuracy: 1.0000 - val_loss: 0.7966 - val_accuracy: 0.8045
Epoch 8/10
782/782 [==============================] - 5s 6ms/step - loss: 4.2356e-04 - accuracy: 1.0000 - val_loss: 0.8434 - val_accuracy: 0.8060
Epoch 9/10
782/782 [==============================] - 5s 6ms/step - loss: 2.5064e-04 - accuracy: 1.0000 - val_loss: 0.8865 - val_accuracy: 0.8057
Epoch 10/10
782/782 [==============================] - 5s 7ms/step - loss: 1.4970e-04 - accuracy: 1.0000 - val_loss: 0.9293 - val_accuracy: 0.8056

I think after every eopchs i should see 25000 instead of 782 here! why i am seeing 782? is it iterating in 782 examples instead of 25000? or everything is ok and its just a number?

  • note that i am using VScode – Reza shahriari Apr 17 '23 at 19:30
  • 1
    The number 782 you see after each epoch in the output represents the number of steps per epoch, not the number of examples. In your case, you have set the batch size to the default value of 32, the number of steps per epoch will be the (total number of training examples (i.e., 25,000)) divided by the (batch size), which is equal to 782. Everything is fine you're already training on 25000 examples. No need to worry about the 782 – Hassaan Ali Apr 17 '23 at 19:37

1 Answers1

0

No, because by default you have the batch size of 32, so 25000/32=781.25, so you see 782.

TheEngineerProgrammer
  • 1,282
  • 1
  • 4
  • 9