Do we need to derive differential/gradient w.r.t. input data in backward function(chainer)?

Question

I am implementing a very complex Function in my research, It use Belief Propagation in this layer. I have derived the gradient w.r.t. W(parameter) of this layer, But because its complex, I haven't derived the gradient w.r.t. input_data(the data come from former layer).

I am very confusion about the detail of back propagation. I search a lot about BP algorithm, Some notes says it is ok only to differential w.r.t. W(parameter) and use residual to get gradient ? Your example seems we need also to calculate gradient w.r.t. input data(former layer output). I am confusion? Very typical example is, how to derive gradient w.r.t. input image in convolutional layer?

My network has two layers, Do I need to derive gradient by hand w.r.t. input X in the last layer? (backward need to return gx in order to let BP works to gradient flow to former layer)?

PS: Each layer has its own parameter – machen Aug 31 '17 at 14:44 — machen, Aug 31 '17 at 14:44

score 0 · Answer 1 · answered Sep 01 '17 at 00:45

If you do not need the gradient w.r.t. the input, you can omit its computation. In this case, return None as the placeholder for the omitted input gradient. Note that, in this case, the grad of the input after backprop will be incorrect. If you want to write a Function that can be used in any context (including the case that one wants the gradient w.r.t. the input), you have to compute the gradients w.r.t. all the inputs (except for the case that the Function is not differentiated w.r.t. the input). This is the reason why the built-in functions of Chainer compute gradients for all the inputs.

By the way, deriving the gradient w.r.t. the input image of a convolutional layer is simple: apply transposed-convolution (which is called "deconvolution" in Chainer for the historical reason) to the output using the same weight.

For example, suppose I have 2 conv layers, the bottom is 1st layer, the top is last layer. My question is that the 1 conv layers "weight update" related to "the gradient w.r.t. the 2 layer input image?" — machen, Sep 01 '17 at 02:15

Do we need to derive differential/gradient w.r.t. input data in backward function(chainer)?

1 Answers1