Tensorflow model.train() not looping through all data

Question

I'm trying to train a model for mnist.

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

print(x_train.shape)

What i got is (60000, 28, 28), there are 60,000 items in the data set.

Then, I create the model with the following code.

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

However, I got only 1875 items for each epoch.

2020-06-02 04:33:45.706474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-06-02 04:33:45.706617: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-06-02 04:33:47.437837: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2020-06-02 04:33:47.437955: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-02 04:33:47.441329: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441480: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-H3BEO7F
2020-06-02 04:33:47.441876: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-02 04:33:47.448274: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27fc6b2c210 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-02 04:33:47.448427: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch 1/5
1875/1875 [==============================] - 1s 664us/step - loss: 0.2971 - accuracy: 0.9140
Epoch 2/5
1875/1875 [==============================] - 1s 661us/step - loss: 0.1421 - accuracy: 0.9582
Epoch 3/5
1875/1875 [==============================] - 1s 684us/step - loss: 0.1068 - accuracy: 0.9675
Epoch 4/5
1875/1875 [==============================] - 1s 695us/step - loss: 0.0868 - accuracy: 0.9731
Epoch 5/5
1875/1875 [==============================] - 1s 682us/step - loss: 0.0764 - accuracy: 0.9762

Process finished with exit code 0

This has been asked multiple times before, see duplicate answer. — Dr. Snoopy, Jun 02 '20 at 10:02

alift · Accepted Answer · 2020-06-02T10:16:49.573

0

You are using the whole data, no worries!

Due to the Keras documentation, https://github.com/keras-team/keras/blob/master/keras/engine/training.py when you use model.fit and you do not specify the batch size, it got assigned to 32 by default.

batch_size Integer or NULL. Number of samples per gradient update. If unspecified, batch_size will default to 32

It means that for each epoch you have 1875 steps, and in each step, your model has taken 32 data examples into the account. And guess what, 1875*32 is equal to 60,000.

edited Jun 02 '20 at 10:16

answered Jun 02 '20 at 09:00

alift

1,855
2
13
28

This question is for python, I do not think it is correct to give links to the Keras R documentation. – Dr. Snoopy Jun 02 '20 at 10:02
Thanks for your comment @Dr.Snoopy . I changed the link to Keras source code, so it is python related, however, the implementation concept is independent of python or R/ – alift Jun 02 '20 at 10:16
Thanks for the answer! – Vincent Du Jun 03 '20 at 22:14

Tensorflow model.train() not looping through all data

1 Answers1