E-net Deep learning architecture

Question

The research paper is available on the link:

https://arxiv.org/pdf/1606.02147.pdf

Not able to understand the initial block of the Enet architecture.

Statement given in research paper on page 3:

ENet initial block. MaxPooling is performed with non-overlapping 2 × 2 windows, and the convolution has 13 filters, which sums up to 16 feature maps after concatenation.

So the question is, How are we getting the 16 filters after concatenation?

score 0 · Answer 1 · answered Jan 19 '19 at 03:38

0

Let's take an example, suppose input image has dims as (128,128,3), now with conv of ((3,3),2,13),where 2 is stride size and 13 is number of filter, we get output as (64,64,13) (Basic conv operation). Now in the right block, we have max-pool, which return output as (64,64,3). On concat both output, we have (64,64,16).

answered Jan 19 '19 at 03:38

Ankish Bansal

1,827
3
15
25

After conv we get 13 feature maps of 64x64 and after max pooling on input image we get a single image. Then, how are channels (3) added with no. of feature maps (13)? – Rochan Jan 21 '19 at 10:47
`13` represent channel output of conv layer, which concatenate with `3` channel of max-pool layer,. – Ankish Bansal Jan 21 '19 at 10:50

E-net Deep learning architecture

1 Answers1