0

I am learning convolutional neural network with Tensorflow.

I have some doubts regarding tf.nn.conv2d. One of its parameters is filter:

a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]

I do not understand what is the meaning of out_channels.

Suppose input image is [1, 3, 3, 1]. So the size is 3xx and the channel is 1.
Then we have a filter [2, 2, 1, 5], which means after the filtering, we will have an image of size 2x2 ("valid" padding) with 5 channels.

Where are the 5 channels from? From my understanding, the filtering can only have 1 channel generated. Is Tensorflow using 5 different filter functions here?

derek
  • 9,358
  • 11
  • 53
  • 94
  • Possible duplicate of http://stackoverflow.com/questions/34619177/what-does-tf-nn-conv2d-do-in-tensorflow – jabalazs Feb 25 '17 at 01:15
  • Not duplicate. I am asking different question.It is a comment of that thread: Regarding this: "This still gives a 5x5 output image, but with 7 channels (size 1x5x5x7). Where each channel is produced by one of the filters in the set.", I still have difficulty understanding where the 7 channels are from? what do you mean "filters in the set"? Thanks – derek Feb 25 '17 at 01:20

1 Answers1

1

The filter argument to the tf.nn.conv2d function, as you quoted, is a 4D tensor of dimensions [filter_height, filter_width, in_channels, out_channels]. This tensor represents a stack of out_channels filters of dimension filter_height x filter_width, to be applied over an image with in_channels channels.

The parameters, filter_height, filter_width and out_channels are defined by you, whereas input_channels is dependent on your input to tf.nn.conv2d.

In other words, a filter tensor with dimensions [2, 2, 1, 5], represents 5 different 2 x 2 filters to be applied over a 1-channel input, but you could perfectly change it to [2, 2, 1, 7], or whatever else gives you better results.

To further illustrate, in the following gif you have a [3, 3, 1, 1] tensor filter convolving over a [1, 5, 5, 1] image. This means you have only 1 filter being convolved over the image.

Convolution GIF

GIF source

jabalazs
  • 1,214
  • 11
  • 17
  • You should add reference for using the gif animation. – greeness Feb 25 '17 at 02:49
  • With your GIF example, what is output of [3,3,1,2] filter? – derek Feb 25 '17 at 10:26
  • That would be a `[1, 3, 3, 2]` convolved feature, equivalent to a `3 x 3` image with `2` channels. However the GIF example is just an illustration since it doesn't include any kind of padding. If you want a consistent way of determining the output size of a convolution you should check the `(W - F + 2P) / S + 1` formula explained in [Stanford's CNN course](http://cs231n.github.io/convolutional-networks/). – jabalazs Feb 25 '17 at 11:30
  • with the current filter matrix specified in your GIF example, you can have a convolved feature containing one channel/layer as shown in your GIF example. So now let us say we change the filter to [3,3,1,2] which implies there will be 2 out_channels, i.e., the convolved feature will have two layers/channels. In this case, does that mean we will have another filter matrix different from the one specified in your GIF example? – derek Feb 25 '17 at 21:26