I am interested in computing the gradient of a loss that is calculated from a product of a matrix multiplication in TensorFlow with Eager Execution. I can do so if the product is computed as a tensor, but not if it's assign()
ed in place to a variable. Here is the greatly reduced code:
import tensorflow as tf
import numpy as np
tf.enable_eager_execution()
multipliers_net = tf.get_variable("multipliers", shape=(1, 3, 3, 1),
initializer=tf.random_normal_initializer())
activations_net = tf.Variable(tf.ones_like(multipliers_net))
output_indices = [(0, 1, 2, 0)]
def step():
global activations_net
#### PROBLEMATIC ####
activations_net.assign(multipliers_net * activations_net)
#### NO PROBLEM ####
# activations_net = multipliers_net * activations_net
return tf.gather_nd(activations_net, output_indices)
def train(targets):
for y in targets:
with tf.GradientTape() as tape:
out = step()
print("OUT", out)
loss = tf.reduce_mean(tf.square(y - out))
print("LOSS", loss)
de_dm = tape.gradient(loss, multipliers_net)
print("GRADIENT", de_dm, sep="\n")
multipliers_net.assign(LEARNING_RATE * de_dm)
targets = [[2], [3], [4], [5]]
train(targets)
As it stands, this code will show the correct OUT and LOSS values, but the GRADIENT will be printed as None. However, if the line below "PROBLEMATIC" is commented and the "NO PROBLEM" is uncommented, the gradient is computed just fine. I infer this is because in the second case, activations_net
becomes a Tensor, but I don't know why that suddenly makes the gradient computable, whereas before it was not.
I'm pretty sure that I should keep activations_net
and multiplier_net
as Variables, because in the larger scheme of things, both are updated dynamically and as I understand it, such things are best kept as Variables instead of constantly reassigning Tensors.