Batch Normalization when CNN with only 2 ConvLayer?

Question

I wonder if it is a problem to use BatchNormalization when there are only 2 convolutional layers in a CNN. Can this have adverse effects on classification performance? Now I don't mean the training time, but really the accuracy? Is my network overloaded with unneccessary layers? I want to train the network with a small data set.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding = 'same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compilke(optimizer="Adam", loss='categorical_crossentropy, metrics =['accuracy'])

Many thanks.

score 1 · Answer 1 · edited Aug 08 '21 at 18:39

Don’t Use With Dropout

Batch normalization offers some regularization effect, reducing generalization error, perhaps no longer requiring the use of dropout for regularization.

Removing Dropout from Modified BN-Inception speeds up training, without increasing overfitting.

— Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.

Further, it may not be a good idea to use batch normalization and dropout in the same network.

The reason is that the statistics used to normalize the activations of the prior layer may become noisy given the random dropping out of nodes during the dropout procedure.

Batch normalization also sometimes reduces generalization error and allows dropout to be omitted, due to the noise in the estimate of the statistics used to normalize each variable.

— Page 425, Deep Learning, 2016.

Source - machinelearningmastery.com - batch normalization

Agree, also dropout is used to reduce overfitting (it's a regularizer) but your network is already so small. — Gaurav Sharma, Aug 23 '21 at 12:09

Batch Normalization when CNN with only 2 ConvLayer?

1 Answers1