0

In CNN, the filters are usually set as 3x3, 5x5 spatially. Can the sizes be comparable to the image size? One reason is for reducing the number of parameters to be learnt. Apart from this, is there any other key reasons? for example, people want to detect edges first?

Yeeye
  • 3
  • 1

1 Answers1

1

You answer a point of the question. Another reason is that most of these useful features may be found in more than one place in an image. So, it makes sense to slide a single kernel all over the image in the hope of extracting that feature in different parts of the image using the same kernel. If you are using big kernel, the features could be interleaved and not concretely detected.

In addition to yourself answer, reduction in computational costs is a key point. Since we use the same kernel for different set of pixels in an image, the same weights are shared across these pixel sets as we convolve on them. And as the number of weights are less than a fully connected layer, we have lesser weights to back-propagate on.

Giang Nguyen
  • 450
  • 8
  • 17
  • Thank you for your answer, I appreciate it. To my knowledge, the seeming scanning over in the image domain comes from the fact that, the Laplacian is circulent on grids. I don't understand the localization part, is there any theoretical supports to this design? Or in fact accoding to human experience, as you said, if a big kernel is used, the features could be interleaved and not concretely detected. – Yeeye Mar 13 '19 at 14:24
  • I actually did not delve deeply like that so I can not answer your question deliberately. May there are mathematical explanations somewhere but I think the idea is pretty intuitive and its supported by human experience. – Giang Nguyen Mar 14 '19 at 01:11
  • Anyway, your question is quite out of scope of this topic. As you can ask another question about localization part since other people can look at and give you the answer, I think if your question is fulfilled, you can close it. – Giang Nguyen Mar 14 '19 at 01:14