1

I tried to calculate the Hessian matrix w.r.t. model parameters

However, each tensor shape in the Hessian matrix was not symmetric.

import tensorflow as tf

x_train = tf.constant(tf.random.uniform(shape=(100, 24), minval=0, maxval=100))
y_train = tf.constant(tf.random.uniform(shape=(100, 1), minval=0, maxval=1))

x_test = tf.constant(tf.random.uniform(shape=(17, 24), minval=0, maxval=100))
y_test = tf.constant(tf.random.uniform(shape=(17, 1), minval=0, maxval=1))

f = tf.keras.models.Sequential()
f.add(tf.keras.layers.Dense(30, activation='relu', use_bias=False))
f.add(tf.keras.layers.Dense(1, activation='sigmoid', use_bias=False))
f.compile(loss='binary_crossentropy')
f.fit(x_train, y_train)

ws = f.trainable_weights

with tf.GradientTape(persistent=True) as tape2:
    tape2.watch(x_train)
    with tf.GradientTape(persistent=True) as tape1:
        tape1.watch(x_train)
        y_hat = f(x_train)
        loss = tf.keras.losses.binary_crossentropy(y_train, y_hat)
    grads = tape1.gradient(loss, ws)
    
H = [tape2.gradient(grad, ws) for grad in grads]

print([h_elem.shape for _H in H for h_elem in _H])

The output is

[TensorShape([24, 30]), TensorShape([30, 1]), TensorShape([24, 30]), TensorShape([30, 1])]

I expected the shape of

[[24,30], [30, 1], [30, 1], [30,1]]

considering the symmetric characteristics of the Hessian matrix. However, I don't know why the 3rd tensor shape was [24, 30] in my code.

I suspected this is because the hessian matrix is follow equation:

[df(x)/dL1L1, df(x)/dL1,L2, df(x)/dL2,L1, df(x)/dL2,L2]

where L is parameters of layer.

In this case, the 2nd, 3rd element is not the same shape.

Is it correct?

costaparas
  • 5,047
  • 11
  • 16
  • 26
MyPrunus
  • 57
  • 5

0 Answers0