2

I'm hoping someone can provide some insight or advice on this as I'm truly puzzled.

As part of my machine learning journey, I have created my own image set and am attempting to build a Keras/TensorFlow CNN from the ground up (I've already tried using existing models; this is the next step)

A best-practice seems to be to mean-center and unit-variance the input training data and to apply that same scaling to the validation/test data.

I can calculate the mean and std of my entire training set. What I don't understand is how to pass these metrics to the validation step.

Since this sounds like something everyone would have to do I'm surprised that this isn't baked into Keras. Or if it is and I'm missing it, please feel free to enlighten me.

Examples I've seen are on very small data sets (cifar10) that can be entirely loaded into memory (or maybe I don't understand them) That's not possible with my data.

I've considered using Keras pre-processing functionality, but that only takes as input a single numpy array. I would have to hard-code the channel means/stds, which obviously not desirable.

And the important bits of my code/model are:

train_datagen = ImageDataGenerator(
                                   rotation_range = 30,
                                   width_shift_range = 0.3,
                                   height_shift_range = 0.3,
                                   zoom_range = 0.25,
                                   horizontal_flip=True,
                                   vertical_flip=True,
                                   featurewise_center=True,
                                   featurewise_std_normalization=True,
                                  )

train_generator = train_datagen.flow_from_directory(
        training_dir,
        target_size=(image_width, image_height),
        batch_size=batch_size,
        class_mode='categorical',
        follow_links=True
)

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(image_width, image_height),
        batch_size=batch_size,
        class_mode='categorical',
        follow_links=True
)

If anyone thinks it's useful I can also post my code to calculate the mean and standard deviation using 'flow_from_directory'

I thank all of you who take the time to read and/or respond.

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
Joe
  • 511
  • 1
  • 6
  • 13
  • I have stumbled across this answer which may contain the trick you are looking for: https://stackoverflow.com/questions/40449522/keras-imagedatagenerator-setting-mean-and-std I am interested in your code to calculate the mean and std, would you kindly share it? Thank you! – jharb Jun 05 '18 at 09:14

0 Answers0