-1

The research paper is available on the link:

https://arxiv.org/pdf/1606.02147.pdf

Not able to understand the initial block of the Enet architecture.

Statement given in research paper on page 3:

ENet initial block. MaxPooling is performed with non-overlapping 2 × 2 windows, and the convolution has 13 filters, which sums up to 16 feature maps after concatenation. initial block of e-net

So the question is, How are we getting the 16 filters after concatenation?

Rochan
  • 1,412
  • 1
  • 14
  • 17

1 Answers1

0

Let's take an example, suppose input image has dims as (128,128,3), now with conv of ((3,3),2,13),where 2 is stride size and 13 is number of filter, we get output as (64,64,13) (Basic conv operation). Now in the right block, we have max-pool, which return output as (64,64,3). On concat both output, we have (64,64,16).

Ankish Bansal
  • 1,827
  • 3
  • 15
  • 25
  • After conv we get 13 feature maps of 64x64 and after max pooling on input image we get a single image. Then, how are channels (3) added with no. of feature maps (13)? – Rochan Jan 21 '19 at 10:47
  • `13` represent channel output of conv layer, which concatenate with `3` channel of max-pool layer,. – Ankish Bansal Jan 21 '19 at 10:50