First of all, given z = torch.linspace(-1, 1, steps=5, requires_grad=True)
and y = z
, the function is a vector-valued function, so the derivative of y
w.r.t z
is not as simple as 1
but a Jacobian matrix. Actually in your case z = [z1, z2, z3, z4, z5]T
, the upper case T
means z
is a row vector. Here is what the official doc says:

Secondly, notice the official doc says: Now in this case y is no longer a scalar. torch.autograd could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument link. In that case x.grad
is not the actual gradient value (matrix) but the vector-Jacobian product.
EDIT:
x.grad
is the actual gradient if your output y
is a scalar.
See the example here:
z = torch.linspace(-1, 1, steps=5, requires_grad=True)
y = torch.sum(z)
y.backward()
z.grad
This will output:
tensor([1., 1., 1., 1., 1.])
As you can see, it is the gradient. Notice the only difference is that y
is a scalar value here while a vector value in your example. grad can be implicitly created only for scalar outputs
You might wonder what if the gradient is not a constant, like dependent on input z
as in this case
z = torch.linspace(-1, 1, steps=5, requires_grad=True)
y = torch.sum(torch.pow(z,2))
y.backward()
z.grad
The output is:
tensor([-2., -1., 0., 1., 2.])
It is the same as
z = torch.linspace(-1, 1, steps=5, requires_grad=True)
y = torch.sum(torch.pow(z,2))
y.backward(torch.tensor(1.))
z.grad
The blitz tutorial is kind of brief so it is actually quite hard to understand for beginners.