-1

This is my code. I tried to build a VGG 11 layers network, with a mix of ReLu and ELu activation and many regularizations on kernels and activities. The result is really confusing: The code is at 10th epoch. My loss on both train and val have decreased from 2000 to 1.5, but my acc on both train and val remained the same at 50%. Can somebody explain to me?

# VGG 11
from keras.regularizers import l2
from keras.layers.advanced_activations import ELU
from keras.optimizers import Adam
model = Sequential()

model.add(Conv2D(64, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          input_shape=(1, 96, 96), activation='relu'))
model.add(Conv2D(64, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001),activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(128, (3, 3), kernel_initializer='he_normal',     
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, (3, 3), kernel_initializer='he_normal',     
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(256, (3, 3), kernel_initializer='he_normal',     
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),     
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# convert convolutional filters to flat so they can be feed to fully connected layers
model.add(Flatten())

model.add(Dense(2048, kernel_initializer='he_normal',
               kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.01)))
model.add(ELU(alpha=1.0))
model.add(Dropout(0.5))

model.add(Dense(1024, kernel_initializer='he_normal',
               kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.01)))
model.add(ELU(alpha=1.0))
model.add(Dropout(0.5))

model.add(Dense(2))
model.add(Activation('softmax'))

adammo = Adam(lr=0.0008, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=adammo, metrics=['accuracy'])
hist = model.fit(X_train, y_train, batch_size=48, epochs=20, verbose=1, validation_data=(X_val, y_val))
Estellad
  • 29
  • 4
  • You are using too much of regularization – enterML Aug 03 '17 at 17:02
  • Thank you Nain. Can you explain the theoretical reason why acc is not increasing? I know too much regularization would for sure minimize the loss. – Estellad Aug 03 '17 at 19:08
  • @Estellad Add a comment why you down voted the answer I gave. Just because you have a theoretical preference for this network, your initialization, your ELU, your your arbitrarily chosen activation functions does not mean that it is correct. A lot of this is not common. That is why I suggested an entirely different structure. – modesitt Aug 03 '17 at 19:46
  • @Estellad Furthermore, a theoretical reason for accuracy not increasing is because the intense regularization requires the kernel (weights) to be too small to accurately fit your network at all. This can especially happen if input features are on different scale (which is the case in images). – modesitt Aug 03 '17 at 19:47
  • I did not down vote your comment. Should be from someone else. – Estellad Aug 03 '17 at 19:51
  • @Estellad One reason might be the over complicated structure. Applying too much of activity regularization, limits the output value of a layer in the network. In short, you have put too many constraints on the network to learn – enterML Aug 04 '17 at 05:57
  • Thank you Nain! I will try alternatives. – Estellad Aug 04 '17 at 18:16

1 Answers1

1

This is not a defect, in fact, it is entirely possible!

Categorical cross entropy loss does not require that accuracy go up with the loss decreasing. This is not a bug in keras or theano, but rather a network or data problem.

This network structure is probably over-complicated for what you might be trying to do. You should remove some of your regularization, use only ReLu, use less layers, use the standard adam optimizer, a larger batch, etc. Try first using one of keras' default models like VGG16,

If you want to see their implementation to edit it for a different VGG11 like structure. It is here:

def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)

    return model

You can see it is much more simple. It only uses rely (which has gotten popular these days) has no regularization, different convolution structure, etc. Modify that to your needs!

modesitt
  • 7,052
  • 2
  • 34
  • 64
  • Thank you for your suggestions, Nucl3ic! I tried the standard VGG16, but with fewer filters: 64-128-256-512 -> 16-32-64-128. The performance was not improved and the same as a LeNet 5's. So I thought I should try to build my own network that is theoretically the best. I will simplify my model then. – Estellad Aug 03 '17 at 19:56
  • What are you're images? – modesitt Aug 03 '17 at 19:59
  • My images are all resized to 96*96, grayscale image for a single nuclei. – Estellad Aug 03 '17 at 20:02
  • Ah. That is more complicated than most average applications. Depending on your data set size a more sophisticated network like ResNet or GoogLeNet my help. – modesitt Aug 03 '17 at 20:12
  • Yes, thank you for your response modesitt! I am trying ResNet Now. – Estellad Aug 04 '17 at 18:15