I am trying to train a model in Python using Tensorflow on Google Colab Pro+ (51Gb of available RAM). This model needs to be trained with a bunch of HD images (9000 1440x720 images) . In order to train such model, I prepare my data with a tf.data.DataSet in the following way:
train_dataset = tf.data.Dataset.list_files(str(PATH + 'train_*.png'))
train_dataset = train_dataset.map(load_image_train,num_parallel_calls=tf.data.AUTOTUNE)
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.batch(BATCH_SIZE)
This works fine. However, when I try to sequentially access my data for training, using
train_dataset.take()
then it seems like TensorFlow tries to bring all my images into RAM and RAM capacity is exceeded. How could I avoid this behavior and only execute my data preparation functions on the images that are accessed with take()?
Thanks in advance for your help.