0

Correct me if I was wrong; the input image's dimension is 227x227x3 so after the first convolution layer the output dimension will be 55x55x(3x96)=55x55x288 not 55x55x96.

See image bellow:

enter image description here

dejanualex
  • 3,872
  • 6
  • 22
  • 37

1 Answers1

0

A convolutional layer is made of a number of filters with kernel size (n x m). Each filter has dimension (n x m x c), where c is the number of channels in the previous layer. In your example, the input is 227x227x3, and the convolutional layer has 96 filters with 11x11x3 parameters (kernel size is 11x11). Each filter creates a new output channel that is 55x55x1. The result is of dimension 55x55x96.

André Panisson
  • 876
  • 9
  • 22
  • But we know that each filter has 3 channels the same as the input image. so each filter creates a new output channel that is 55x55x3, unless we do some magic operation between these 3 channels that gives 55x55x1 – Mustafa akir Apr 28 '20 at 13:38
  • Each filter goes through the input as a rolling window of size 11x11x3 and step (stride) 4 and applies an element-wise block multiplication followed by a sum over all values (including channels), resulting in one single value that is placed in one of the 55x55x1 positions. Note that (227 - 11)/4 + 1 = 55, this takes into account the filter size (11) and the stride (4). – André Panisson Apr 28 '20 at 17:26