0

3 questions:

  1. what is grad_outputs in chainer?

  2. one example in chainer's function F.transpose, how to explain this backward code?

    def backward(self, inputs, grad_outputs): gy = grad_outputs[0] inv_axes = self.axes if self.axes: axes = tuple(ax % len(self.axes) for ax in self.axes) inv_axes = tuple(numpy.argsort(axes)) gx = gy.transpose(inv_axes) return gx,

  3. suppose I want implement self define function, but my inputs[0] and inputs[1] have different shape, in order to back propagation using differential chain rule, I have to write following code in backward:

    a, b = inputs gy = grad_outputs[0] return a * gy, b * gy But, a and b is not same shape, and a * gy and b * gy maybe report error? shape doesn't match to multiply?

machen
  • 283
  • 2
  • 10

1 Answers1

0

*This answer applies to chainer v2, the Function class's internal behavior may change after chainer v3 to support differentiable backpropagation.

Back propagation proceeds from final layer to first layer to propagate its gradients in order to calculate gradient for each layer's parameters.

The function's backward function receives gradient of output, and need to calculate & return gradient of input.

  1. grad_outputs is the gradient for this function's output, in array (numpy or cupy) form.
  2. I believe the basic idea is, F.transpose's differentiation is also just a transpose, so it is just returning the transpose of gradient of output, gy. However rigorously, F.transpose's transpose order is specified when we forward the computation, this order is kept as self.axes and in it needs to be reverse ordered in backward computation. I guess inv_axes is the reversely ordered axes and it is used to calculate gradient of input, written as gx.
  3. As you wrote, you can return gradient of inputs in tuple format like return a * gy, b * gy. Shape does not matter and it can be different for each function's input (as well as the return values of backward)
corochann
  • 1,604
  • 1
  • 13
  • 24
  • about answer 3: if shape are different in a and b, How can it multiply with same shape gy. a * gy, b * gy? – machen Aug 29 '17 at 08:33
  • still don't know what the shape is gy ? – machen Aug 29 '17 at 08:33
  • I don't know what kind of formula you want to implement, but at least you can get the shape of `a`, `b` and `gy` in `backward` function, and you can write your formula according to the shape (if the formula depends on the shape). These are just numpy array or cupy array so you can use broadcast too. – corochann Aug 30 '17 at 11:32
  • one more question: if w and x shape are different(like conv layer or other layer), thus dw and dx shape are different, how can we element-wise top-layer's dx multiply by the previous-layer's dw that affect the change of w of this previous layer. – machen Oct 27 '17 at 01:33