There are two issues arises with convolution:
Every time after convolution operation, original image size getting shrinks:
#!/usr/bin/python
# import necessary modules
from keras.models import Sequential
from keras.layers import Conv2D
model = Sequential()
model.add(Conv2D(1, (3,3), strides=(2, 2), input_shape=(5, 5, 1)))
model.summary()
as we have seen in above example, in image classification task there are multiple convolution layers so after multiple convolution operation, our original image will really get small.
The second issue is that, when kernel moves over original images, it touches the edge of the image less number of times and touches the middle of the image more number of times and it overlaps also in the middle. So, the corner features of any image or on the edges aren't used much in the output.
So, in order to solve these two issues, a new concept is introduced called padding. Padding preserves the size of the original image.
So if a ∗ matrix convolved with an f*f matrix the with padding p
then the size of the output image will be (n + 2p — f + 1) * (n + 2p — f + 1)
where p =1
In this case.
So if a ∗ matrix convolved with an f*f matrix the with padding p then the size of the output image will be (n + 2p — f + 1) * (n + 2p — f + 1)
where p =1
In this case.
Stride
left image: stride=0
, middle image: stride=1
, right image: stride=2
.
Stride is the number of pixel shifts over the input matrix. For padding p, filter size ∗ and input image size ∗ and stride ''
our output image dimension will be [ {( + 2 − + 1) / } + 1] ∗ [ {( + 2 − + 1) / } + 1]
.
model.add(Conv2D(1, (3,3), strides=(2, 2), input_shape=(5, 5, 1)))
model.summary()
model.set_weights(weights)
yhat = model.predict(data)
for r in range(yhat.shape[1]):
print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 2, 2, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[12.0, 17.0]
[9.0, 14.0]
Pooling
A pooling layer is another building block of a CNN
. Pooling Its function is to gradually reduce the spatial size of the representation to reduce the network complexity and computational cost.
Average Pooling
from keras.layers import AveragePooling2D
model = Sequential()
model.add(Conv2D(1, (3,3), padding='same', input_shape=(5, 5, 1)))
model.add(AveragePooling2D((2,2)))
model.summary()
model.set_weights(weights)
yhat = model.predict(data)
for r in range(yhat.shape[1]):
print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Model: "sequential_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_7 (Conv2D) (None, 5, 5, 1) 10
_________________________________________________________________
average_pooling2d_1 (Average (None, 2, 2, 1) 0
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
[11.5, 14.25]
[9.5, 14.0]
from keras.layers import Flatten
model = Sequential()
model.add(Conv2D(1, (3,3), padding='same', input_shape=(5, 5, 1)))
model.add(AveragePooling2D((2,2)))
model.add(Flatten())
model.summary()
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_8 (Conv2D) (None, 5, 5, 1) 10
_________________________________________________________________
average_pooling2d_2 (Average (None, 2, 2, 1) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4) 0
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_______________________