Convolutional Neural Networks - Theory

Question

I am sorry for asking this stupid question, but after a bit thinking, I still don't get it yet:

According to Jordi Torres (see here), if we look at an image with 28x28 = 784 pixels, then one way to implement this is to let one neuron of a hidden layer learn about 5x5 = 25 pixels of the input layer:

However, as he explains it:

Analyzing a little bit the concrete case we have proposed, we note that, if we have an input of 28×28 pixels and a window of 5×5, this defines a space of 24×24 neurons in the first hidden layer because we can only move the window 23 neurons to the right and 23 neurons to the bottom before hitting the right (or bottom) border of the input image. We would like to point out to the reader that the assumption we have made is that the window moves forward 1 pixel away, both horizontally and vertically when a new row starts. Therefore, in each step, the new window overlaps the previous one except in this line of pixels that we have advanced.

I really don't get why we need a space of 24x24 neurons in the first hidden layer? Since I take 5x5 windows (which have 25 pixels out of 784 in them), I thought we would need 785/25 = 32 neurons at all. I mean, doesn't one neuron of the hidden layer learn the property of 25 pixels? Apparently not, but I am really confused.

Because it's a sliding window. The 5x5 windows overlap. This question might be more on-topic on one of the more theoretical sites. — beaker, May 05 '20 at 18:13
Yes, you arer right. Where can I find "the more theoretical sites"? — , May 06 '20 at 07:34
If you look at the pull-down menu (top-right on the full web site, top-left on mobile) there's a whole list of Stack Exchange sites including Computer Science, Cross Validated and Data Science. You'd have to look at the help center for each one to see what kinds of questions are on-topic for that particular site. — beaker, May 06 '20 at 14:35

score 0 · Accepted Answer · answered May 06 '20 at 08:21

You're assuming non-overlapping 5x5 segments, but that's not the case. In this example, the first output is derived from rows 1-5, columns 1-5 of the input. The next one uses rows 1-5, columns 2-6, on to rows 1-5, columns 24-28, then rows 2-6, columns 1-5, etc. etc. until rows 24-28, columns 24-28. This is referred to as a "stride" of 1.

Convolutional Neural Networks - Theory

1 Answers1