How to imagine convolution/pooling on images with 3 color channels

Question

I am a beginner and i understood the mnist tutorials. Now i want to get something going on the SVHN dataset. In contrast to mnist, it comes with 3 color channels. I am having a hard time visualizing how convolution and pooling works with the additional dimensionality of the color channels.

Has anyone a good way to think about it or a link for me ?

I appreciate all input :)

score 4 · Accepted Answer · answered Jun 16 '16 at 08:12

4

This is very simple, the difference only lies in the first convolution:

in grey images, the input shape is [batch_size, W, H, 1] so your first convolution (let's say 3x3) has a filter of shape [3, 3, 1, 32] if you want to have 32 dimensions after.
in RGB images, the input shape is [batch_size, W, H, 3] so your first convolution (still 3x3) has a filter of shape [3, 3, 3, 32].

In both cases, the output shape (with stride 1) is [batch_size, W, H, 32]

answered Jun 16 '16 at 08:12

Olivier Moindrot

27,908
11
92
91

Thank you very much ! Does that also mean that when i have 3 color channels i should choose a higher number of kernels to a monocrome image ( the idea being that with 3 colors, there is more possibilities of patterns that filters could detect) ? – hmmmbob Jun 16 '16 at 09:26
I don't think you need to change your model, 32 or 64 kernels is already enough to capture 3 colors ! – Olivier Moindrot Jun 16 '16 at 09:52
Thank you.. and i am sure you are right. My question was more a theoretical one, that more colors should in theory warrant more kernels than a monochrome one – hmmmbob Jun 16 '16 at 10:00
Yes the input data has 3 times more features so you could adapt your model a bit to have more parameters, to capture the higher dimensionality. – Olivier Moindrot Jun 16 '16 at 11:07

How to imagine convolution/pooling on images with 3 color channels

1 Answers1