3

I am interested in computing the gradient of a loss that is calculated from a product of a matrix multiplication in TensorFlow with Eager Execution. I can do so if the product is computed as a tensor, but not if it's assign()ed in place to a variable. Here is the greatly reduced code:

import tensorflow as tf
import numpy as np
tf.enable_eager_execution()

multipliers_net = tf.get_variable("multipliers", shape=(1, 3, 3, 1),
                                  initializer=tf.random_normal_initializer())
activations_net = tf.Variable(tf.ones_like(multipliers_net))
output_indices = [(0, 1, 2, 0)]

def step():
    global activations_net

    #### PROBLEMATIC ####
    activations_net.assign(multipliers_net * activations_net)
    #### NO PROBLEM ####
    # activations_net = multipliers_net * activations_net

    return tf.gather_nd(activations_net, output_indices)


def train(targets):
    for y in targets:
        with tf.GradientTape() as tape:
            out = step()
            print("OUT", out)
            loss = tf.reduce_mean(tf.square(y - out))
            print("LOSS", loss)
        de_dm = tape.gradient(loss, multipliers_net)
        print("GRADIENT", de_dm, sep="\n")
        multipliers_net.assign(LEARNING_RATE * de_dm)


targets = [[2], [3], [4], [5]]

train(targets)

As it stands, this code will show the correct OUT and LOSS values, but the GRADIENT will be printed as None. However, if the line below "PROBLEMATIC" is commented and the "NO PROBLEM" is uncommented, the gradient is computed just fine. I infer this is because in the second case, activations_net becomes a Tensor, but I don't know why that suddenly makes the gradient computable, whereas before it was not.

I'm pretty sure that I should keep activations_net and multiplier_net as Variables, because in the larger scheme of things, both are updated dynamically and as I understand it, such things are best kept as Variables instead of constantly reassigning Tensors.

Ivan Vegner
  • 1,707
  • 4
  • 14
  • 23

1 Answers1

1

I'll try to explain to the best of my knowledge. The problem occurs in a this line

de_dm = tape.gradient(loss, multipliers_net)

If you print(tape.watched_variables() in both "PROBLEMATIC" and "NO PROBLEM" cases, you'll see that in first case tape 'watches' the same multipliers_net variable twice. You can try tape.reset() and tape.watch(), but it will have no effect as long as you pass assign op into it. If you try multipliers_net.assign(any_variable) inside tf.GradientTape(), you'll find that it won't work. But if you try assigning something that produces tensor, e.g. tf.ones_like(), it will work.

multipliers_net.assign(LEARNING_RATE * de_dm)

This works for same reason. It seems to accept only eager_tensors Hope this helps

Sharky
  • 4,473
  • 2
  • 19
  • 27
  • It seems that the `assign` op has no gradient, no matter what you're assigning... – Ivan Vegner Mar 16 '19 at 03:14
  • Yes. I actually a little confused its behaviour, especially eager.Variable.assign, which should work but it doesn't. So does read_value parameter – Sharky Mar 16 '19 at 06:09
  • 1
    but i think as long as you do `multipliers_net.assign` you get a resourceVariable which is what you want – Sharky Mar 16 '19 at 06:17
  • 1
    @IvanVegner i think the answer to your question is here: https://github.com/tensorflow/tensorflow/issues/17735 – Cupitor Mar 06 '22 at 20:51