Keras doesn't train using fit_generator()

Question

I am using Keras 2.0.4 (TensorFlow backend) for an image classification task. I am trying to train my own network (without any pretrained parameters). As my data is huge I cannot load all into memory. For this reason I use ImageDataGenerator(), flow_from_directory() and fit_generator().

Creating ImageDataGenerator object:

train_datagen = ImageDataGenerator(preprocessing_function = my_preprocessing_function) # only preprocessing; no augmentation; static data set

my_preprocessing_function rescales images to domain [0,255] and centers data by mean reduction (similar to preprocessing of VGG16 or VGG19)

Use method flow_from_directory() from the ImageDataGenerator object:

train_generator = train_datagen.flow_from_directory(
    path/to/training/directory/with/five/subfolders,
    target_size=(img_width, img_height),
    batch_size=64,
    classes = ['class1', 'class2', 'class3', 'class4', 'class5'],
    shuffle = True,
    seed = 1337,
    class_mode='categorical')

(The same is done in order to create a validation_generator.)

After defining and compiling the model (loss function: categorical crossentropy, optimizer: Adam), I train the model using fit_generator():

model.fit_generator(
    train_generator,
    steps_per_epoch=total_amount_of_train_samples/batch_size,
    epochs=400,
    validation_data=validation_generator,
    validation_steps=total_amount_of_validation_samples/batch_size)

Problem:
There is no error message, but training doesn't perform well. After 400 epochs, accuracy still oscillates around 20% (which is as good as randomly choosing one of those classes). Indeed, the classifier always predicts 'class1'. The same holds true after only one epoch of training. Why is this the case although I am initializing random weights? What is wrong? What am I missing?

U S E D M O D E L

x = Input(shape=input_shape)

# Block 1
x = Conv2D(16, (3, 3), activation='relu', padding='same', name='block1_conv1')(x)
x = Conv2D(16, (5, 5), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

# Block 2
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv2D(64, (5, 5), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

# Block 3
x = Conv2D(16, (1, 1), activation='relu', padding='same', name='block3_conv1')(x)

# Block 4
x = Conv2D(256, (3, 3), activation='relu', padding='valid', name='block4_conv1')(x)
x = Conv2D(256, (5, 5), activation='relu', padding='valid', name='block4_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

# Block 5
x = Conv2D(1024, (3, 3), activation='relu', padding='valid', name='block5_conv1')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

# topping 
    x = Dense(1024, activation='relu', name='fc1')(x)
    x = Dense(1024, activation='relu', name='fc2')(x)
predictions = Dense(5, activation='softmax', name='predictions')(x)

E D I T
terminal output

Can you please provide your model? Try set `model` to a pretrained network (e.g. VGG16) and fine-tune it with your data. If the network converges, your model is probably the issue. — Fábio Perez, May 29 '17 at 19:15
An interesting observation during training: at each beginning of epoch accuracy jumps to ~25% or ~15%, then it anneals to ~20% at the end of the epoch. I have provided my model in my post now. Using a pretrained model (like VGG16) causes an accuracy of ~65%. — D.Laupheimer, May 30 '17 at 06:14
It seems that all learning progress is eliminated after an epoch - regardless if I use a pretrained model or an own model. This explaines, why my nets aren't leraning at all. My own model starts at ~20% accuracy and stays on this level. Pretrained networks start at ~65% and stay at this level. Introducing batch normalization layers between convolutional layers (in the model given in initial post) does not change accuracy at all. BUT: top2_categorical_accuracy equals 100% after epoch 1. Strange! — D.Laupheimer, May 30 '17 at 06:35
If you use a pretrained network like Fábio Perez suggested you need to make sure you freeze all layers apart from the top and maybe the last conv block. If all layers are trainable, you almost immediately crush all trained features with your training and therefore you won´t have any insight if your data is the problem. What kind of images are you training with? — petezurich, Jun 11 '17 at 05:34

Keras doesn't train using fit_generator()

0 Answers0

Linked