1

code:

a = T.vector()
b = T.vector()

loss = T.sum(a-b)

dy = T.grad(loss, a)
d2y = T.grad(loss, dy)

f = theano.function([a,b], y)
print f([.5,.5,.5], [1,0,1])

output:

theano.gradient.DisconnectedInputError: grad method was asked to compute
the gradientwith respect to a variable that is not part of the
computational graph of the cost, or is used only by a non-differentiable
operator: Elemwise{second}.0

how is a derivative of the graph not part of the graph? Is this why scan is used to compute the hessian?

Amir
  • 10,600
  • 9
  • 48
  • 75
user2255757
  • 756
  • 1
  • 6
  • 24

1 Answers1

1

Here:

d2y = T.grad(loss, dy)

you are attempting to compute the gradient of the loss with respect to dy. However the loss depends only on the values of a and b and not dy, hence the error. It only makes sense to compute partial derivatives of the loss with respect to parameters that actually affect its value.

The easiest way to compute the Hessian in Theano is to use the theano.gradient.hessian convenience function:

d2y = theano.gradient.hessian(loss, a)

See the documentation here for an alternative manual method that uses a combination of theano.grad and theano.scan.

In your example the Hessian will be a 3x3 matrix of zeros, since the partial derivative of the loss w.r.t. a is independent of a (it's just a vector of ones).

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • thanks, but I don't understand why "he partial derivative of the `loss` w.r.t. `a` is independent of `a`" – user2255757 Feb 03 '16 at 00:33
  • 1
    The gradient of the loss w.r.t. `a` is just a vector of ones, so all of the second order partial derivatives w.r.t. `a` will be zero. For example, if *a* was scalar then *d/da (a - b) = 1* and *d^2/da^2 (a - b) = 0*. – ali_m Feb 03 '16 at 00:44