0

In the traditional residual block, is the "addition" of layer N to the output of layer N+2 (prior to non-linearity) element-wise addition or concatenation?

The literature indicates something like this:

X1 = X
X2 = relu(conv(X1))
X3 = conv(X2)
X4 = relu(conv(X3 + X1))
rodrigo-silveira
  • 12,607
  • 11
  • 69
  • 123

1 Answers1

1

It has to be element-wise, with concatenation you don't get a residual function. One has also to be aware about using the proper padding mode so convolutions produce outputs with the same spatial dimensions as the block input.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140
  • Thanks for that explanation. Since the convolutions must retain dimensions, does the input ever change size as it flows through the net? – rodrigo-silveira Dec 22 '17 at 14:56
  • @rodrigo-silveira It does through some blocks between the residual ones. Read the ResNet paper for more information about their architecture. – Dr. Snoopy Dec 22 '17 at 20:35