HI I'm preprocessing some image data to run in a simple FF network:
I have two options that in my eyes are the same but one performs a lot better than the other:
Option 1
I save the images in a directory with correspondent subdirectories and run this:
xy_training = tf.keras.preprocessing.image_dataset_from_directory("/content/data/train", image_size=(48,48), color_mode='grayscale',label_mode="int")
xy_validation = tf.keras.preprocessing.image_dataset_from_directory("/content/data/valid", image_size=(48,48), color_mode='grayscale',label_mode="int")
xy_testing = tf.keras.preprocessing.image_dataset_from_directory("/content/data/test", image_size=(48,48), color_mode='grayscale',label_mode="int")
Option 2 I have the raw arrays of the grayscale images and do this
def preprocess(data):
X = []
pixels_list = data["pixels"].values
for pixels in pixels_list:
single_image = np.reshape(pixels.split(" "), (WIDTH,HEIGHT)).astype("float")
X.append(single_image)
# Convert list to 4D array:
X = np.expand_dims(np.array(X), -1)
# Normalize pixel values to be between 0 and 1
X = X / 255.0
return X
train_images= preprocess(train_data)
valid_images= preprocess(valid_data)
test_images= preprocess(test_data)
Option 2 performs so much better than Option 1. Is there a parameter in tf.keras.preprocessing.image_dataset_from_directory(
i'm not setting?
Thanks!