0

I have a model with a complex loss, computed per class of the model output.

As you can see below, I'm computing the loss with some custom loss function, assigning this value to the variable, as tensor are immutable in tensorflow.

def calc_loss(y_true, y_pred):
    num_classes=10
    pos_loss_class = tf.Variable(tf.zeros((1, num_classes), dtype=tf.dtypes.float32))
    for idx in range(num_classes):
        pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx] 
        pos_loss_class[:, idx].assign(pos_loss)

return tf.reduce_mean(pos_loss_class)

My code is simple:

with tf.GradientTape() as tape:
    output = model(input, training=True)
    loss = calc_loss(targets, output)
grads = tape.gradient(loss, model.trainable_weights)

However, I receive None for all model's variables. From my understanding this is caused by a blocking manner of the state of the variable as written here: https://www.tensorflow.org/guide/autodiff#4_took_gradients_through_a_stateful_object

Any suggestions?

Here is the reproducible code, which is a toy example, but demonstrates the issue.

y_true = tf.Variable(tf.random.normal((1, 2)), name='targets')
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
    y_pred = layer(x)
    loss_class = tf.Variable(tf.zeros((1,2)), dtype=tf.float32)
    for idx in range(2):
        loss = tf.abs(y_true[:, idx] - y_pred[:, idx])
        loss_class[:, idx].assign(loss)
    final_loss = tf.reduce_mean(loss_class)
grads = tape.gradient(final_loss, layer.trainable_weights)
Spivakoa
  • 11
  • 4

1 Answers1

0

My current second guess, is that the assign method blocks the gradient, as explained in the tensorflow page you liked... instead, try to use just a plain list:

def calc_loss(y_true, y_pred):
    num_classes=10
    pos_loss_class = []
    for idx in range(num_classes):
        pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx] 
        pos_loss_class.append(pos_loss)

    return tf.reduce_mean(pos_loss_class)
Alberto Sinigaglia
  • 12,097
  • 2
  • 20
  • 48
  • All variables of the model are automatically being watched, as long they are under the tape scope and you don't set the watch_accessed_variables to False. I checked and the variables I create in the loss function are watched too. – Spivakoa Jul 26 '22 at 21:54
  • @Spivakoa yes I know they are watched, however you posted a very very piece of code, and the chance that the problem might be outside that, are not few... this one was just a guess – Alberto Sinigaglia Jul 26 '22 at 22:33
  • @Spivakoa try with this second guess... but until you don't post a min rep example, guesses are the only thing I can give you... – Alberto Sinigaglia Jul 26 '22 at 22:38