0

In deep learning training, I notice some weird stuff. Let me try to explain.

The overall number of training occurrences during laptop training differs from GPU training. Using TensorFlow Keras, I created a Convolutional Neural Network. My training instances total 31561, and my batch size is set at 32.

Training on Laptop

When the code is executed on my laptop, during training, I think the model updates the gradient after each batch, so it iterates in the following manner.

Epoch 1

    1/987, 2/987, 3/987, ..., 987/987

As per my understanding, the model divided the total training data by the batch size and returned 987 (31561/32=987).

For more detail screen shot is attached here

Training on GPU Machine

It's my university's GPU, which I can access virtually. When I execute the same code here, it presents training scenarios in the following manner during the training:

Epoch 1

1/31561, 2/31561, 3/31561,..., 31561/31561

For more detail see screen shot:

In this case, the model displays 31561 instead of 987. I'm not sure why the instances values changes when we execute the same code on two different platforms. Surprisingly, the results from both are almost similar.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Ahmad
  • 645
  • 2
  • 6
  • 21
  • These are just differences in reporting among different TF and Keras versions - some versions report the no. of samples while others report the no. of batches. Nothing to worry. Could you post the respective TF versions? – desertnaut Jul 23 '21 at 10:27
  • also check the virtual env. that you (hopefully) use. `pip list` and check versions. on gpu the tensorflow-gpu version is required. On you local machine you most probably have just tensorflow. – B.Kocis Jul 23 '21 at 10:40

0 Answers0