Fine-tuning of Keras autoencoders of cat images

Question

I want to use autoencoders on real life photos (and not simple MNIST digits). I have taken the cats and dog dataset and train with it. My parameters are:

I stick with a grayscale and a scaled down verson of 128x128 px image and do some preprocessing in the ImageDataGenerator for data augmentation.
I train with different of datasets of about 2000 images or cats and dogs. I could take 10000 but it lasts too long.
I choose a convolution network with basic downsamplers and upsamplers and toyed with the parameters and ended up with a bootlebeck of 8x8x8 = 512 (it is 1/32 of the original image with 128x128px).

Here is the python code:

from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import metrics
from keras.callbacks import EarlyStopping
import os

root_dir = '/opt/data/pets'
epochs = 400 # epochs of training, the more the better
batch_size = 64 # number of images to be yielded from the generator per batch
seed = 4321 # constant seed for constant conditions
# keras image input type definition
img_channel = 1 # 1 for grayscale, 3 for color
 # dimension of input image for network, the bigger the more CPU and RAM is used
img_x, img_y = 128, 128
input_img = Input(shape = (img_x, img_y, img_channel))

# this is the augmentation configuration we use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

# this is the augmentation configuration we will use for testing
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolders of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        root_dir + '/train',  # this is the target directory
        target_size=(img_x, img_y), # all images will be resized
        batch_size=batch_size,
        color_mode='grayscale',
        class_mode='input', # necessarry for autoencoder
        shuffle=False, # important for correct filename for labels
        seed = seed)

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        root_dir + '/validation',
        target_size=(img_x, img_y),
        batch_size=batch_size,
        color_mode='grayscale',
        class_mode='input',  # necessarry for autoencoder
        shuffle=False,  # important for correct filename for labels
        seed = seed)

# create convolutional autoencoder inspired from https://blog.keras.io/building-autoencoders-in-keras.html
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(img_channel, (3, 3), activation='sigmoid', padding='same')(x) # example from documentaton

autoencoder = Model(input_img, decoded)
autoencoder.summary() # show model data

autoencoder.compile(optimizer='sgd',loss='mean_squared_error',metrics=[metrics.mae, metrics.categorical_accuracy])

# do not run forever but stop if model does not get better
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, mode='auto', verbose=1)

# do the actual fitting
autoencoder_train = autoencoder.fit_generator(
        train_generator,
        validation_data=validation_generator,
        epochs=epochs,
        shuffle=False,
        callbacks=[stopper])

# create an encoder for debugging purposes later
encoder = Model(input_img, encoded)

# save the modell paramers to a file
autoencoder.save(os.path.basename(__file__) + '_model.hdf')

## PLOTS ####################################
import matplotlib.pyplot as plt
# Plot loss over epochs    
print(autoencoder_train.history.keys())
plt.plot(autoencoder_train.history['loss'])
plt.plot(autoencoder_train.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()


# Plot original, encoded and predicted image
import numpy as np
images_show_start = 1
images_show_stop = 20
images_show_number = images_show_stop - images_show_start +1

images,_ = train_generator.next()
plt.figure(figsize=(30, 5))
for i in range(images_show_start, images_show_stop):
    # original image
    ax = plt.subplot(3, images_show_number, i +1)
    image = images[i,:,:,0]
    image_reshaped = np.reshape(image, [1, 128, 128, 1])
    plt.imshow(image,cmap='gray')

    # label
    image_label = os.path.dirname(validation_generator.filenames[i])
    plt.title(image_label) # only OK if shuffle=false

    # encoded image
    ax = plt.subplot(3, images_show_number, i + 1+1*images_show_number)
    image_encoded = encoder.predict(image_reshaped)
     # adjust shape if the network parameters are adjusted
    image_encoded_reshaped = np.reshape(image_encoded, [16,32])
    plt.imshow(image_encoded_reshaped,cmap='gray')

    # predicted image
    ax = plt.subplot(3, images_show_number, i + 1+ 2*images_show_number)
    image_pred = autoencoder.predict(image_reshaped)
    image_pred_reshaped = np.reshape(image_pred, [128,128])
    plt.imshow(image_pred_reshaped,cmap='gray')
plt.show()

In the network configuration you see the layers. What do you think? It is to deep or to simple? What adjustments could one do?

The loss decreased over the epochs as it should be.

And here we have three images in each column:

the original (scaled down) image,
the encoded image and
the predicted.

So, I wonder, why the encoded images look quite similar in characteristics (besides they are all cats) with lot of vertical lines. The encoded images are quite big with 8x8x8 pixel that I ploted with 16x32 pixel which makes it 1/32 of the pixel of the original images. Is the quality of the decoded image sufficient for that? Can it somehow improved? Can I even make a smaller bottleneck in the Autoencoder ? If I try a smaller bottleneck the loss is stuck at 0.06 and the predicted images are very bad.

Images are much more complex than that, I think. You need more filters and more layers to get useful information from them. (But you'll probably need a lot of filters in the encoded images). --- By the way, try some "perceptual loss" technique using a pretrained model, because "mse" will hardly attempt to borders, focus, etc. — Daniel Möller, Aug 18 '18 at 18:08

score 1 · Accepted Answer · answered Aug 20 '18 at 04:52

Your model contains only a very few parameters (~32,000) only. These might just not be enough to process the data and to get an insight the data generating probability distribution. Your convolutions decrease the image size always by a factor of 2, but you do not increase the number of filters. This means, that your convolutions are not volume-preserving but actually strongly shrinking. This might simply be too strong. I would at first try to increase the number of parameters and check if this helps to make the images less blurry. Then, if the images actually get better by increasing the number of parameters (it should, as the compression level is now lower than before) you can decrease the number of parameters (i.e. size of the compressed state) again. This way can help you to spot other problems in your code.

Maybe you can take a look at existing autoencoder implementations in keras which work in different datasets (which also feature more complex data, too), like this one which uses CIFAR10.

The black lines in the encoded state images might just come from the way how you plot the data. As your data in this layer does not have depth 1 but 8 you must resize it. If the original cube had lower values at the borders (which makes sense, as there is most likely not that much important information), you will rearrange the dark/black surface of the cube and project it on a 2D surface; this then might look like this repetitive black lines.

Furthermore, considering the loss plot of the network, it might also be the case, that the training has not converged yet. So, the quality of the images might still increase if you continue training.

Lastly, you should use all training images available and not just a small subset. This will (of course) increase the time necessary to train but the results of the encoder will be much better as the network will be more resistant to overfitting and most-likely able to better generalize.

Shuffling your data might also increase the training's performance.

What do you mean by "Your convolutions decrease the image size always by a factor of 2, but you do not increase the number of filters". I use only the maxpooling to decrease the image dimensions AFAIS. In order to increase the size of filters I could uses e.g. several `x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)` that follow each other? — tardis, Aug 20 '18 at 13:31
@tardis What I meant is that you could think about increasing the number of filter on your operations. If you take a look at the number of filters of your data during the encoding process it looks like this: 1-32-32-32-32-16-8-8-8. You could try for example something like: 1-8-16-32-64-32-16-8. — zimmerrol, Aug 23 '18 at 12:37

Fine-tuning of Keras autoencoders of cat images

1 Answers1