ResNet50 transfer learning fails

Question

I need one or more hints to get over the first pain in transfer-learning.

The following code is a stripped-down version of what I am actually trying to do, but it shows the issues even with one the fake image (A: empty / B: empty + little square) I use there. In the final version, the input will be much more complex images (which justifies the complexity of the applied base model).

The problem looks simple. Input: two types of images, output: binary classification ("square present yes/no"). The modified ResNet50 model is fed with prepared training data via ImageDataGenerator. As I can create any amount of fake data, there is no data augmentation step in the code.

Anyway, when I run the code the displayed loss (for both the Adam and the SDG optimizer) doesn't seem to improve and the accuracy quickly tends to approach the ratio of the number of the examples in the two image classes (i.e. B/A). (Note: during the weekend, I even tried for 500 epochs ... no change.)

For both (most likely connected) issues I haven't been able to spot the reason yet ... could you? Is it one of the hyper-parameters, is there an obvious glitch in the model setup or any other part of the implementation? Probably it's just something stupid, but after chasing it and playing around with different and more and more simplified versions, I am about to run out of ideas regarding what to try next.

import cv2
import matplotlib.pyplot as plt

import numpy as np

from tqdm import tqdm
from random import randint

from keras.layers import Dense, GlobalAveragePooling2D
from keras.optimizers import Adam
from keras.models import Model
from keras.applications import ResNet50

from keras.preprocessing.image import ImageDataGenerator


def modified_resnet_model():

    # load ResNet50 model excluding classification layers
    basemodel = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

    # freeze model weights
    for layer in basemodel.layers:
        layer.trainable = False

    # add new classification head
    x = GlobalAveragePooling2D()(basemodel.output)
    x = Dense(128, activation='relu')(x)
    predictions = Dense(1, activation='softmax')(x)
    modresnet50model = Model(inputs=basemodel.input, outputs=predictions)

    # return the result
    return modresnet50model


def data_set_creator(numsamples, probpos, target_image_size=(224, 224)):

    dataset = {}
    image_stack = []
    immean = np.array([0.0, 0.0, 0.0])
    imstat = {}

    # first create target labels
    lbbuf = np.zeros((numsamples, 1))
    lbbuf[:int(probpos*numsamples)] = 1
    lbbuf = np.random.permutation(lbbuf)

    # second create matching "fake" images according to label stack
    for index in tqdm(range(numsamples)):

        # zero labeled images are empty
        img = np.zeros((target_image_size[0], target_image_size[1], 3)).astype(np.float32)
        sh = 10
        if lbbuf[index]:
            # all others contain a suqare somewhere
            xp = randint(sh, target_image_size[0]-1-sh)
            yp = randint(sh, target_image_size[1]-1-sh)
            randval = 100  # randint(1, 255)
            # print('center: ({0:d},{1:d}); value: {2:d}'.format(xp, yp, randval))
            img[yp-sh:yp+sh, xp-sh:xp+sh, :] = randval
        # else:
        #     print(' --- ')

        # normalize image and add it to the image stack
        img /= 255.0  # normalize image
        image_stack.append(img)

        # update mean vector
        immean += cv2.mean(img)[:-1]

    # assemple data set
    imstat['mean'] = immean/numsamples

    image_stack = np.array(image_stack)

    dataset['images'] = image_stack
    dataset['imstat'] = imstat
    dataset['labels'] = lbbuf

    # return the result
    return dataset


if __name__ == '__main__':

    # define some parameters
    imagesize = (224, 224)
    nsamples = 10000
    pos_prob_train = 0.3
    probposval = pos_prob_train
    valfrac = 0.1   # use 10% of the data for validation
    batchsize = 24
    epochs = 30
    stepsperepoch = 100
    validationsteps = 25

    # ================================================================================

    # create training and validation data sets
    nst = int(nsamples*(1-valfrac))
    dataset_training = data_set_creator(nst, pos_prob_train, target_image_size=imagesize)
    dataset_validation = data_set_creator(nsamples-nst, probposval, target_image_size=imagesize)

    # subtract the mean (training data!) from all the images
    for ci in range(3):
        dataset_training['images'][:, :, :, ci] -= dataset_training['imstat']['mean'][ci]
        dataset_validation['images'][:, :, :, ci] -= dataset_training['imstat']['mean'][ci]

    # get the (modified) model
    model = modified_resnet_model()
    theoptimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
    model.compile(optimizer=theoptimizer, loss='binary_crossentropy', metrics=['accuracy'])
    print(model.summary())

    # setup data input generators
    train_datagen = ImageDataGenerator()
    validation_datagen = ImageDataGenerator()
    train_generator = train_datagen.flow(dataset_training['images'],
                                         dataset_training['labels'],
                                         batch_size=batchsize)
    validation_generator = validation_datagen.flow(dataset_validation['images'],
                                                   dataset_validation['labels'],
                                                   batch_size=batchsize)

    # train the (modified) model
    history = model.fit_generator(train_generator, steps_per_epoch=stepsperepoch,
                                  epochs=epochs, validation_data=validation_generator,
                                  validation_steps=validationsteps)

    #visualize the training and validation performance
    acc = history.history['acc']
    val_acc = history.history['val_acc']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    nepochs = range(1, len(acc)+1)
    plt.plot(nepochs, acc, 'bo', label='Training acc')
    plt.plot(nepochs, val_acc, 'b', label='Validation acc')
    plt.title('Training and validation accuracy')
    plt.legend()
    plt.savefig('trainval_acc.png')

    plt.figure()
    plt.plot(nepochs, loss, 'bo', label='Training loss')
    plt.plot(nepochs, val_loss, 'b', label='Validation loss')
    plt.title('Training and validation loss')
    plt.legend()
    plt.savefig('trainval_loss.png')
    plt.show()

Change the activation function of last layer to `sigmoid`. Softmax on one class always ouputs 1. — today, Aug 22 '18 at 15:48
You are absolutely right! Although this obvious glitch was just resulting from (many steps of more and more) simplifying my originally multi-class problem, I wasn't able to see it anymore at the end. Thanks to your valuable hint, I got a new working code basis to start from ... very much appreciated! — Any Goe, Aug 23 '18 at 08:56

ResNet50 transfer learning fails

0 Answers0