Declaring Variables inside the Tensorflow GradientTape

Question

I have a model with a complex loss, computed per class of the model output.

As you can see below, I'm computing the loss with some custom loss function, assigning this value to the variable, as tensor are immutable in tensorflow.

def calc_loss(y_true, y_pred):
    num_classes=10
    pos_loss_class = tf.Variable(tf.zeros((1, num_classes), dtype=tf.dtypes.float32))
    for idx in range(num_classes):
        pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx] 
        pos_loss_class[:, idx].assign(pos_loss)

return tf.reduce_mean(pos_loss_class)

My code is simple:

with tf.GradientTape() as tape:
    output = model(input, training=True)
    loss = calc_loss(targets, output)
grads = tape.gradient(loss, model.trainable_weights)

However, I receive None for all model's variables. From my understanding this is caused by a blocking manner of the state of the variable as written here: https://www.tensorflow.org/guide/autodiff#4_took_gradients_through_a_stateful_object

Any suggestions?

Here is the reproducible code, which is a toy example, but demonstrates the issue.

y_true = tf.Variable(tf.random.normal((1, 2)), name='targets')
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
    y_pred = layer(x)
    loss_class = tf.Variable(tf.zeros((1,2)), dtype=tf.float32)
    for idx in range(2):
        loss = tf.abs(y_true[:, idx] - y_pred[:, idx])
        loss_class[:, idx].assign(loss)
    final_loss = tf.reduce_mean(loss_class)
grads = tape.gradient(final_loss, layer.trainable_weights)

can you please provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? — Alberto Sinigaglia, Jul 26 '22 at 22:33
I added a toy example to my original question below. Thank you. — Spivakoa, Jul 27 '22 at 08:20
You’re welcome, feel free to mark it as correct, so that also others can note it in the future if they have the same issue — Alberto Sinigaglia, Jul 28 '22 at 18:55

Alberto Sinigaglia · Answer 1 · 2022-07-26T22:36:29.567

0

My current second guess, is that the assign method blocks the gradient, as explained in the tensorflow page you liked... instead, try to use just a plain list:

def calc_loss(y_true, y_pred):
    num_classes=10
    pos_loss_class = []
    for idx in range(num_classes):
        pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx] 
        pos_loss_class.append(pos_loss)

    return tf.reduce_mean(pos_loss_class)

edited Jul 26 '22 at 22:36

answered Jul 26 '22 at 18:01

Alberto Sinigaglia

12,097
2
20
48

All variables of the model are automatically being watched, as long they are under the tape scope and you don't set the watch_accessed_variables to False. I checked and the variables I create in the loss function are watched too. – Spivakoa Jul 26 '22 at 21:54
@Spivakoa yes I know they are watched, however you posted a very very piece of code, and the chance that the problem might be outside that, are not few... this one was just a guess – Alberto Sinigaglia Jul 26 '22 at 22:33
@Spivakoa try with this second guess... but until you don't post a min rep example, guesses are the only thing I can give you... – Alberto Sinigaglia Jul 26 '22 at 22:38

Declaring Variables inside the Tensorflow GradientTape

1 Answers1