Output shapes and parameters of a CNN with Keras

Question

I have difficulty understanding the output shapes and number of parameters of layers in a Keras CNN model.

Let's take this toy example:

model = Sequential()
model.add(Conv1D(7, kernel_size=40, activation="relu", input_shape=(60, 1)))
model.add(Conv1D(10, kernel_size=16, activation="relu"))
model.add(MaxPooling1D(pool_size=3))
model.summary()

The output is:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_17 (Conv1D)           (None, 21, 7)             287       
_________________________________________________________________
conv1d_18 (Conv1D)           (None, 6, 10)             1130      
_________________________________________________________________
max_pooling1d_11 (MaxPooling (None, 2, 10)             0         
=================================================================
Total params: 1,417
Trainable params: 1,417
Non-trainable params: 0
_________________________________________________________________

For the first Conv1D layer, there are 7 filters of output size (60 - 40 + 1) = 21 each. The number of parameters is (40 + 1) * 7 = 287, to take the bias into account. So, I'm OK with it.

But on which dimension will operate the second Conv1D layer? I guess that the output filter size is 21 - 16 + 1 = 6, but I don't understand by which operation we can go from 7 to 10 for the last dimension. I don't understand either how the number of parameters is computed.

Finally, I don't understand the output shape of the MaxPooling1D layer, since I would expect the output size to be 6 - 3 + 1 = 4 and not 2. How is it computed?

You might find this CNN cheat sheet helpful, https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks — Hiho, Apr 14 '20 at 19:41

today · Accepted Answer · 2020-04-14T21:55:03.057

... but I don't understand by which operation we can go from 7 to 10 for the last dimension.

By the same operation that it went from 1 to 7 in the first layer: the convolution filters are applied on whole last axis (i.e. dimension) of their input and produce a single number at each application window. There are 10 filters in the second convolution layer, therefore 10 values would be generated for each window, hence the dimension of last axis would be 10 (the same reasoning applies to the first convolution layer as well).

I don't understand either how the number of parameters is computed.

There are 10 filters. As I mentioned above, the filter is applied on the whole last axis. So they must have a width of 7 (i.e. last axis size of their input). And the kernel size is 16. So we have: 10 * (16 * 7) + 10 (1 bias per filter) = 1130.

Finally, I don't understand the output shape of the MaxPooling1D layer, since I would expect the output size to be 6 - 3 + 1 = 4 and not 2. How is it computed?

The stride of 1D-pooling layer is by default equal to the pool_size. Therefore, applying on a sequence of length 6, a pooling layer of size 3 would have only 2 application windows.

Note: You may also find this relevant answer useful about how 1D-conv works.

Thank you very much for these explanations! I understand the Conv1D layer and the default behavior of the MaxPooling1D layer now. The post you made on 1D convolutions is the best I have found so far. Too bad I didn't find it before when I searched on stackoverflow... — Mark Morrisson, Apr 14 '20 at 21:32

Output shapes and parameters of a CNN with Keras

1 Answers1