I am implementing a very complex Function in my research, It use Belief Propagation in this layer. I have derived the gradient w.r.t. W(parameter) of this layer, But because its complex, I haven't derived the gradient w.r.t. input_data(the data come from former layer).
I am very confusion about the detail of back propagation. I search a lot about BP algorithm, Some notes says it is ok only to differential w.r.t. W(parameter) and use residual to get gradient ? Your example seems we need also to calculate gradient w.r.t. input data(former layer output). I am confusion? Very typical example is, how to derive gradient w.r.t. input image in convolutional layer?
My network has two layers, Do I need to derive gradient by hand w.r.t. input X in the last layer? (backward need to return gx in order to let BP works to gradient flow to former layer)?