1

I am trying to train a model in Python using Tensorflow on Google Colab Pro+ (51Gb of available RAM). This model needs to be trained with a bunch of HD images (9000 1440x720 images) . In order to train such model, I prepare my data with a tf.data.DataSet in the following way:

train_dataset = tf.data.Dataset.list_files(str(PATH + 'train_*.png'))
train_dataset = train_dataset.map(load_image_train,num_parallel_calls=tf.data.AUTOTUNE)
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.batch(BATCH_SIZE)

This works fine. However, when I try to sequentially access my data for training, using

train_dataset.take()

then it seems like TensorFlow tries to bring all my images into RAM and RAM capacity is exceeded. How could I avoid this behavior and only execute my data preparation functions on the images that are accessed with take()?

Thanks in advance for your help.

clank
  • 71
  • 2
  • 8
  • Hi @clank! You also need to specify the number of images that you want to take from batch from training. For e.g. train_dataset.take(5) . –  Sep 08 '22 at 09:58

0 Answers0