0

I would like to compute the hessian of a loss function of a neural network in Tensorflow with respect to all the parameters (or trainable variables). By modifying the example code from the Tensorflow documentation (https://www.tensorflow.org/api_docs/python/tf/GradientTape) I managed to compute the hessian w.r.t the weight matrix for the first layer (if I'm not mistaken):

with tf.GradientTape(persistent=True) as tape:
    loss = tf.reduce_mean(model(x,training=True)**2)
    g = tape.gradient(loss,model.trainable_variables[0]) 
    h=tape.jacobian(g,model.trainable_variables[0])

If I try to compute it w.r.t model.trainable_variables instead the tape.jacobian complains that 'list object has no attribute shape'. I instead tried to flatten the model.trainable_variables and compute it w.r.t the flattened vector:

with tf.GradientTape(persistent=True) as tape:
    loss = tf.reduce_mean(model(x,training=True)**2)
    source = tf.concat([tf.reshape(x,[-1]) for x in model.trainable_variables],axis=0)
    g = tape.gradient(loss,source) 
    h=tape.jacobian(g,source)
   

The problem now is that g is empty (NoneType) for some reason. I noticed that source is tf.Tensor-type but model.trainable_variables[0] was of type tf.ResourceVariable so I tried changing this by declaring source as

source = resource_variable_ops.ResourceVariable(tf.concat([tf.reshape(x,[-1]) for x in model.trainable_variables],axis=0))

This didn't change anything though, so I'm guessing that this is not the issue. I also thought that the problem might be that the source-variable is not watched, but it seems that it is set to trainable and even if i do tape.watch(source), g is still empty.

Does anybody know how I can solve this?

user202542
  • 211
  • 1
  • 8
  • Does this answer your question? [Calculating Hessian with tensorflow gradient tape](https://stackoverflow.com/questions/66020046/calculating-hessian-with-tensorflow-gradient-tape) – o-90 Oct 22 '21 at 17:23
  • Thanks for your reply. It looks very similar but not quite I think (in anyway it didn't work when I tried with two gradienttapes instead of the jacobian). I thought it might have to do with the fact that model.trainable_variables isn't built before entering the loop. I tried calling it on the tensor 'x' with model(x). Then model.trainable_variables is created before the loop, but I still get the same error. – user202542 Oct 23 '21 at 10:32

1 Answers1

0

Maybe you could use a loop on the trainable variables? I know it's a basic idea.

with tf.GradientTape(persistent=True) as tape:
    loss = tf.reduce_mean(model(x,training=True)**2)
    g_list, h_list = [], []
    for train_var in model.trainable_variables:
      g = tape.gradient(loss, train_var)
      g_list.append(g)
      h_list.append(tape.jacobian(g, train_var))

You could also use a second loop before computing the Jacobian and try to concatenate the output lists.

elbe
  • 1,363
  • 1
  • 9
  • 13