How to preprocess image data without too much consuming RAM?

Question

This might seem a basic question, but I am stuck on it and would like to get some help.

I am trying to load and preprocess some images in DICOM Format in order to feed them to my Keras model, since I have about 2 thousand of images, RAM got consumed before I finish the preprocessing step. here is the code of the preprocessing step:

(directory, labels are predefined variables)

shape=(256,256)
patients_filename=tf.constant([directory+'/'+path for path in os.listdir(directory)])
dataset = tf.data.Dataset.from_tensor_slices((patients_filename,labels))
def parse_function(patientfilename,label):
    var=tf.data.Dataset.list_files(patientfilename+'/*')
    for image in var:
        image=tf.io.read_file(image)
        image = tfio.image.decode_dicom_image(image,dtype=tf.uint64)
        image = tf.cast(image, tf.float32)
        image=tf.image.resize(image,size=shape)/65535.0
        image=tf.reshape(image,shape+(1,))
    return image,label

dataset = dataset.map(parse_function).batch(8).prefetch(1)

Then I feed the model with the preprocessed data (dataset).

Do you have any idea how can I do better ?

You are discarding all the changes to `image` on each iteration — RichieV, Aug 22 '20 at 20:46

score 0 · Answer 1 · answered Aug 22 '20 at 21:03

You can use tensorflows tf.keras.preprocessing.image.ImageDataGenerator for preprocessing your image and its 'flow_from_directory` method to load the data from disk as and when required.

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)


train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='categorical')

model.fit(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)

your directory structure should look like

-data
   -train
      -category_name_1
      -category_name_2
   -test
      -category_name_1
      -category_name_2

The labels are automatically derived from the directory name.

For more preprocessing options see the documentation: Link

Can you plz accept and upvote the answer if it worked. – Aniket Bote Aug 29 '20 at 05:22 — Aniket Bote, Aug 29 '20 at 05:22

How to preprocess image data without too much consuming RAM?

1 Answers1