2

I am a beginner and i understood the mnist tutorials. Now i want to get something going on the SVHN dataset. In contrast to mnist, it comes with 3 color channels. I am having a hard time visualizing how convolution and pooling works with the additional dimensionality of the color channels.

Has anyone a good way to think about it or a link for me ?

I appreciate all input :)

hmmmbob
  • 1,167
  • 5
  • 19
  • 33

1 Answers1

4

This is very simple, the difference only lies in the first convolution:

  • in grey images, the input shape is [batch_size, W, H, 1] so your first convolution (let's say 3x3) has a filter of shape [3, 3, 1, 32] if you want to have 32 dimensions after.
  • in RGB images, the input shape is [batch_size, W, H, 3] so your first convolution (still 3x3) has a filter of shape [3, 3, 3, 32].

In both cases, the output shape (with stride 1) is [batch_size, W, H, 32]

Olivier Moindrot
  • 27,908
  • 11
  • 92
  • 91
  • Thank you very much ! Does that also mean that when i have 3 color channels i should choose a higher number of kernels to a monocrome image ( the idea being that with 3 colors, there is more possibilities of patterns that filters could detect) ? – hmmmbob Jun 16 '16 at 09:26
  • I don't think you need to change your model, 32 or 64 kernels is already enough to capture 3 colors ! – Olivier Moindrot Jun 16 '16 at 09:52
  • Thank you.. and i am sure you are right. My question was more a theoretical one, that more colors should in theory warrant more kernels than a monochrome one – hmmmbob Jun 16 '16 at 10:00
  • Yes the input data has 3 times more features so you could adapt your model a bit to have more parameters, to capture the higher dimensionality. – Olivier Moindrot Jun 16 '16 at 11:07