1

I have been trying to understand Convolutional Neural Network but I mess up with its output size.

The Formula is pretty much straightforward but I still end up confusing myself. I have learned from many sources on the Internet like deeplearning.ai of AndrewNg.

So here is where I am getting confused.

OutputSize = InputSize - Filter + 1

If my InputSize is 11 x 11 x 16 and I use max pool with filter size 2. By math my shape should be 5.5 x 5.5 x 16.

Will this float value (5.5) rounded off or will it be taken as 5 when you feed it?

Maxim
  • 52,561
  • 27
  • 155
  • 209
vidit02100
  • 153
  • 13
  • Where is that division coming from? And are you even thinking in the right direction? The input is filtered to produce the output; hence the input size determines the output size. – MSalters Jan 31 '18 at 10:05
  • division is coming from from pooling – vidit02100 Jan 31 '18 at 11:09

1 Answers1

0

The formula that you described is a partial case for a specific setting. A general formula is this:

W2 = (W1 - F + 2*P) / S + 1
H2 = (H1 - F + 2*P) / S + 1

As you can see, it depends not only on the filter size F, but also the stride S and the padding size P (so your formula is just for P=0 and S=1).

In case of max pooling, S is usually 2, so that the pooling performs image downsampling. The result depends solely on the padding:

  • If P=0, the result will be 5x5x16 (i.e., it's ceiling the value).
  • If P=1, the result will be 6x6x16.
  • Any bigger padding doesn't make any sense, but you can calculate the output size using the formula above.

Here's a sample code in keras:

input = Input(shape=(11, 11, 16))
print(MaxPool2D(pool_size=(2, 2), strides=2, padding='valid')(input).shape)
# >>> (?, 5, 5, 16)
print(MaxPool2D(pool_size=(2, 2), strides=2, padding='same')(input).shape)
# >>> (?, 6, 6, 16)
Maxim
  • 52,561
  • 27
  • 155
  • 209