Keras ImageDataGenarator: Inconsistency in flow functions parameter

Question

I am struggling with the image augmentation in Keras.

The concept, where I define an ImageDataGenerator to modify the data and a flow function to apply it to the data is (or seems) clear to me.

But why are the flow functions (flow, flow_from_dataframe, flow_from_directory) different from each other? Their purpose is clear to me: they handle data with different types of sources.

I mean the difference in the parameters to pass. Especially, one difference comes to my mind: for the flow (where I augment data that is already loaded)

I don't have a possibility to state an interpolation mechanism. But don't I need one then?

The main difference between is them is were the data lives that you want to generate: `flow` for in-memory numpy arrays, `flow_from_directory` for directories of images etc. What exactly do you mean with inconsistency? — sdcbr, Jan 14 '19 at 16:18

score 1 · Answer 1 · answered Jan 14 '19 at 17:49

You choose the appropriate flow function depending on how much data you have, and how you have it organized.

flow() is for small data sets that you can manage fully in memory.

flow_from_directory() will read files from subdirectories within a parent directory, using the name of each subdirectory as a label. This choice is good if you have a large amount of data organized by directory. This can become a challenge if you have a common set of files with different sets of features that you want to train, because you will need to store a redundant copy of your data in a different subdirectories (or at least create directories full of symbolic links that point back to your real file storage).

flow_from_dataframe() will read files and labels as specified by a pandas DataFrame. This function is a more recently addition, and it is the most flexible choice because you can store a single copy of your files using any directory structure that you prefer, and you can generate your DataFrame from metadata stored as a csv file, a database, or any other method that pandas supports.

score 0 · Answer 2 · answered Jan 14 '19 at 17:58

Flow is usually used together with the ImageDataGenerator class

Where the augmentation pipeline in general is based on an ImageDataGeneration object, which has the argument fill_mode= 'nearest' - so this is how you will be able to define your Interpolation mechanism.

See a working example from the docs here:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)

datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    fill_mode= 'nearest')

datagen.fit(x_train)

# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
                    steps_per_epoch=len(x_train) / 32, epochs=epochs)

score 0 · Accepted Answer · answered Jan 15 '19 at 06:38

Thanks for all your posts and comments. Unfortunately, none of the posted answers do apply fully to my question. I did some research and went through all the Keras code and came up with an answer that I can now work with.

I guess, the Keras documentation misguided me a bit. I misinterpreted the argument interpolation of the flow_from_directory() and flow_from_directory() method. In thought that this is also used for zooming into the image. In the documentation this should be stated more clearer.

First, the point of Karl is valid. Each of those functions is more suitable for a certain amount of data and the source where to get it from. Here the interpolation comes in, where this is only applied to images that are loaded. For images processed with the flow function, it is assumed that these images have already the desired image size. So hence, it must be done beforehand.

The fill_mode parameter does also not work for interpolation, as this is only to set the virtual pixel around the actual image, in order to perform an affine transformation.

Keras ImageDataGenarator: Inconsistency in flow functions parameter

3 Answers3