1

I have a function fitting network with 4 hidden layers.

I need to find suitable weights for the first and the third layers, while the second and the fourth layers are some kinds of normalization layers and do not need to be learned, so I just froze them by setting their learning rate to zero.

My question is:

Should I define backward function for those two frozen layers?

I saw in caffe that the pooling layer that does not have any learnable parameter has the backward function.

Thanks in advance,

Community
  • 1
  • 1
Ali Sharifi B.
  • 595
  • 1
  • 6
  • 15
  • 2
    Gradients will propagate down, so you cannot stop otherwise your earlier layers won't get information from above. Treat all units in the frozen layers as additive gates - propagate the gradient * 1 through it (i.e no change to the gradient, but allow propagation). – Keir Simmons Mar 02 '17 at 04:54
  • @KeirSimmons Many thanks for your attention. – Ali Sharifi B. Mar 02 '17 at 07:23

1 Answers1

2

Yes, you need a backward pass, otherwise your learning would stop at this layer (nothing below it will learn). Even for non-learnable layers you need to compute valid gradients.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thank you very much, but by highlighting "valid gradients" do you mean that I should suppose frozen layers are not actually frozen and write their own backward function or interact with the incoming gradient just based on Keir's comment? – Ali Sharifi B. Mar 02 '17 at 07:28
  • I mean that it does not matter that the layer is frozen, as long as it is processing data it provides a gradient. For example if you have a layer that multiplies by 123 (and you no longer change this 123) its backward pass should be also outputing 123 times incoming gradient (since d(123*x)/dx = 123). Keir's comment is invalid. If you "pass through" the gradient it will be invalid operation. – lejlot Mar 02 '17 at 18:58