0

HI I'm preprocessing some image data to run in a simple FF network:

I have two options that in my eyes are the same but one performs a lot better than the other:

Option 1

I save the images in a directory with correspondent subdirectories and run this:

xy_training = tf.keras.preprocessing.image_dataset_from_directory("/content/data/train", image_size=(48,48), color_mode='grayscale',label_mode="int")
xy_validation = tf.keras.preprocessing.image_dataset_from_directory("/content/data/valid", image_size=(48,48), color_mode='grayscale',label_mode="int")
xy_testing = tf.keras.preprocessing.image_dataset_from_directory("/content/data/test", image_size=(48,48), color_mode='grayscale',label_mode="int")

Option 2 I have the raw arrays of the grayscale images and do this

def preprocess(data):
    X = []
    pixels_list = data["pixels"].values
    
    for pixels in pixels_list:
        single_image = np.reshape(pixels.split(" "), (WIDTH,HEIGHT)).astype("float")
        X.append(single_image)
        
    # Convert list to 4D array:
    X = np.expand_dims(np.array(X), -1)
    
    # Normalize pixel values to be between 0 and 1
    X = X / 255.0
    return X

train_images= preprocess(train_data)
valid_images= preprocess(valid_data)
test_images= preprocess(test_data)

Option 2 performs so much better than Option 1. Is there a parameter in tf.keras.preprocessing.image_dataset_from_directory( i'm not setting?

Thanks!

lmglm
  • 3
  • 2

1 Answers1

0

This is most probably due to

tf.keras.preprocessing.image_dataset_from_directory 

not having a built in normalization function. The other custom function you have is applying normalization, so comparison in not an apple-to-apple one.

You will have to do the normalization in a later step after loading the datasets using image_dataset_from_directory. Here's a sample code for normalizing after loading a batch dataset:

def normalize(image,label):
    image = tf.cast(image/255. ,tf.float32)
    label = tf.cast(label ,tf.float32)

    return image,label

xy_training = xy_training.map(normalize)
xy_validation = xy_validation.map(normalize)
xy_testing = xy_testing.map(normalize)
Dharman
  • 30,962
  • 25
  • 85
  • 135
mhk777
  • 83
  • 1
  • 9