I know the two functions are for torch's backward propagation and the interface is as follows
updateGradInput(input, gradOutput)
accGradParameters(input, gradOutput, scale)
I'm confused about what the gradInput
and gradOutput
really mean in a layer.
Assume the network's cost is C
and a layer L
. Do gradInput
and gradOutput
of layer L
mean d_C/d_input_L
and d_C/d_output_L
?
If so, how to compute gradInput
accorading to gradOutput
?
Moreover, does accGradParameters
mean to accumulate d_C/d_Weight_L
and d_C/d_bias_L
? If so, how to compute these values?