Why need we assign the grad of y in Chainer?

Question

I am a novice of Chainer. I am doing by the Guides. However, I find something that I think it is strange. In the Docs » Guides » Variable Chapter, I write the blow code:

x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
y = 2.0 * x
# y.grad = np.zeros((2, 3), dtype=np.float32)
y.backward()
print(x.grad)
print(y.data)

Then, there is a error, the error message is :

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

And when we delete the annotated sign, the code is :

x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
y = 2.0 * x
y.grad = np.zeros((2, 3), dtype=np.float32)
y.backward()
print(x.grad)
print(y.data)

Then everything is ok.

So it seems that it must to assign the initial grad to the variable of y. I think this is strange. Should it be one as a default value?

I am looking for your explanation for that, thank you very much!

score 2 · Answer 1 · answered Jun 22 '18 at 00:42

When the output y is not a scalar, Chainer does not fill the initial gradient automatically because backpropagation, in this case, computes Jacobian-vector product for which the vector (given by the output grad) is needed. ones is just an example of the vector which is not special, thus Chainer forces the user to explicitly specify it.

When the output is a scalar (array with shape ()), Chainer automatically fills the output by one (see the reference). It means backpropagation computes the gradient of a scalar-valued function by default, which is natural.

But, when I execute the code y = x**2 + 2.0 * x, even if I annotated the sentence of assign grad , it also can be executed. in this case, the y is not a scalar too. How can we explanation for that? — nwpuxhld, Jun 22 '18 at 14:00

Why need we assign the grad of y in Chainer?

1 Answers1