2

I am currently working in a project on computer vision and I wanted to use the image data generator to load my images according to classes from respective directories so .

I wanted to augment my images by feature_std_normalization

I declared the feature_std_normalization=True while creating the data generator object but while training it is giving an error:

local/lib/python3.6/dist-packages/keras_preprocessing/image/image_data_generator.py:716: UserWarning: This ImageDataGenerator specifies featurewise_center, but it hasn't been fit on any training data. Fit it first by calling .fit(numpy_data). warnings.warn('This ImageDataGenerator specifies '

how to use dategen.fit () while images are coming from generator.flow_from_directory() as datagen.fit() is using X_train and i dont have it

Ramesh R
  • 7,009
  • 4
  • 25
  • 38

1 Answers1

0

If you're using TensorFlow 2, then there are two approaches you could try:

  1. Using .flow_from_directory(): as the docs say, you can actually just pass in the path to the directory holding your images, and then your datagen object is ready to be passed to model.fit(). Here is the example that's provided in the TensorFlow documentation I linked above (with some additional comments for clarity):
# Set the augmentations the data generators will do
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
# Instantiate a DirectoryIterator - this yields the batches of data samples + their labels
train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
# Train a Sequential model
model.fit(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)
  1. Using tf.data.Dataset.from_generator: this approach might be more convenient for you if you want to take advantage of the tf.data API, and your dataset hasn't already been split into training and test sets. Here's an example (from a different page in the docs) of how it works:
# This example uses an image dataset that has NOT been split into train/test yet
flowers = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)
# Like before, set the data augmentations
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)
# (Optional) Double-check the dimensions of a single batch
images, labels = next(img_gen.flow_from_directory(flowers))
print(images.dtype, images.shape)  # float32 (32, 256, 256, 3)
print(labels.dtype, labels.shape)  # float32 (32, 5)
# Now, you can make a dataset with the augmentations
ds = tf.data.Dataset.from_generator(
    lambda: img_gen.flow_from_directory(flowers), 
    output_types=(tf.float32, tf.float32), 
    output_shapes=([32,256,256,3], [32,5])
)

Of course, you might still be wondering, "how are we going to split this ds variable into training and test sets?"

Fortunately, Angel Igareta has written a great blog post on this topic. Below I will include just the code snippet that solves our problem:

def get_dataset_partitions_tf(ds, ds_size, train_split=0.8, val_split=0.1, test_split=0.1, shuffle=True, shuffle_size=10000):
    """Credit to Angel Igareta at https://towardsdatascience.com/how-to-split-a-tensorflow-dataset-into-train-validation-and-test-sets-526c8dd29438 for this code."""
    assert (train_split + test_split + val_split) == 1
    
    if shuffle:
        # Specify seed to always have the same split distribution between runs
        ds = ds.shuffle(shuffle_size, seed=12)
    
    train_size = int(train_split * ds_size)
    val_size = int(val_split * ds_size)
    
    train_ds = ds.take(train_size)    
    val_ds = ds.skip(train_size).take(val_size)
    test_ds = ds.skip(train_size).skip(val_size)
    
    return train_ds, val_ds, test_ds

In this way, you will be able to pass your dataset to model.fit(), and TensorFlow will essentially do the data augmentations for you as it trains.

Last but not least - in your case, I believe you'll want to pass featurewise_std_normalization=True to the ImageDataGenerator constructor. Let me know if I missed something in your question, but I don't think there actually is a parameter named feature_std_normalization for it.

Zain Raza
  • 1
  • 1
  • 3