How to train a Keras model with very a big dataset?

Question

I am trying to train an autoencoder using TensorFlow and Keras. My training data has more than 200K 512x128 unlabeled images. If I want to load the data in a matrix, its shape will be (200000, 512, 128, 3). That is a few hundred GB of RAM space. I know I can reduce the batch size while training but that is for limiting memory usage in GPU/CPU.

Is there a workaround to this problem?

You don't need large batches, and you don't need for all the data to be in memory all the time. Just load the data that you need as part of the training, and release those resources again, once a trainig step is done. Or would that not work in your case? — Jonas V, Apr 24 '22 at 01:14

score 1 · Answer 1 · answered Apr 24 '22 at 13:42

You can use the tf.data API for lazily loading the images... Below tutorial goes into the details..

https://www.tensorflow.org/tutorials/load_data/images

Also look into tf.data.Dataset.prefetch, tf.data.Dataset.batch and tf.data.Dataset.cache methods to optimize performance..

You can also preprocess the data into TFRecords for reading them more efficiently before reading them in your training pipeline...

https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata

How to train a Keras model with very a big dataset?

1 Answers1