I tried to calculate the Hessian matrix w.r.t. model parameters
However, each tensor shape in the Hessian matrix was not symmetric.
import tensorflow as tf
x_train = tf.constant(tf.random.uniform(shape=(100, 24), minval=0, maxval=100))
y_train = tf.constant(tf.random.uniform(shape=(100, 1), minval=0, maxval=1))
x_test = tf.constant(tf.random.uniform(shape=(17, 24), minval=0, maxval=100))
y_test = tf.constant(tf.random.uniform(shape=(17, 1), minval=0, maxval=1))
f = tf.keras.models.Sequential()
f.add(tf.keras.layers.Dense(30, activation='relu', use_bias=False))
f.add(tf.keras.layers.Dense(1, activation='sigmoid', use_bias=False))
f.compile(loss='binary_crossentropy')
f.fit(x_train, y_train)
ws = f.trainable_weights
with tf.GradientTape(persistent=True) as tape2:
tape2.watch(x_train)
with tf.GradientTape(persistent=True) as tape1:
tape1.watch(x_train)
y_hat = f(x_train)
loss = tf.keras.losses.binary_crossentropy(y_train, y_hat)
grads = tape1.gradient(loss, ws)
H = [tape2.gradient(grad, ws) for grad in grads]
print([h_elem.shape for _H in H for h_elem in _H])
The output is
[TensorShape([24, 30]), TensorShape([30, 1]), TensorShape([24, 30]), TensorShape([30, 1])]
I expected the shape of
[[24,30], [30, 1], [30, 1], [30,1]]
considering the symmetric characteristics of the Hessian matrix. However, I don't know why the 3rd tensor shape was [24, 30]
in my code.
I suspected this is because the hessian matrix is follow equation:
[df(x)/dL1L1, df(x)/dL1,L2, df(x)/dL2,L1, df(x)/dL2,L2]
where L
is parameters of layer.
In this case, the 2nd, 3rd element is not the same shape.
Is it correct?