I have a model with a complex loss, computed per class of the model output.
As you can see below, I'm computing the loss with some custom loss function, assigning this value to the variable, as tensor are immutable in tensorflow.
def calc_loss(y_true, y_pred):
num_classes=10
pos_loss_class = tf.Variable(tf.zeros((1, num_classes), dtype=tf.dtypes.float32))
for idx in range(num_classes):
pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx]
pos_loss_class[:, idx].assign(pos_loss)
return tf.reduce_mean(pos_loss_class)
My code is simple:
with tf.GradientTape() as tape:
output = model(input, training=True)
loss = calc_loss(targets, output)
grads = tape.gradient(loss, model.trainable_weights)
However, I receive None for all model's variables. From my understanding this is caused by a blocking manner of the state of the variable as written here: https://www.tensorflow.org/guide/autodiff#4_took_gradients_through_a_stateful_object
Any suggestions?
Here is the reproducible code, which is a toy example, but demonstrates the issue.
y_true = tf.Variable(tf.random.normal((1, 2)), name='targets')
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
y_pred = layer(x)
loss_class = tf.Variable(tf.zeros((1,2)), dtype=tf.float32)
for idx in range(2):
loss = tf.abs(y_true[:, idx] - y_pred[:, idx])
loss_class[:, idx].assign(loss)
final_loss = tf.reduce_mean(loss_class)
grads = tape.gradient(final_loss, layer.trainable_weights)