TensorFlow - Gradients across post-processing assign ops?

Question

If an assign operation is applied to a weight tensor after that weight tensor is used in its portion of the forward pass of a network, does TensorFlow's backpropagation take into account the assign operation when determining the gradient for that weight? For example, if I have

weights = tf.Variable(...)
bias = tf.Variable(...)
output = tf.tanh(tf.matmul(weights, input) + bias)
weight_assign_op = weights.assign(weights + 1.0)
with tf.control_dependencies(weight_assign_op):
    output2 = tf.identity(output)

the output is calculated, and then a change is made to the weights. If the output is then used to calculate a loss and gradients to update the variables, will the gradients be created taking into account the change to weights? That is, will the gradients for weights be the correct gradients for old_weights + 1.0 or will they still be the gradients for old_weights which when applied to the new weights won't necessarily be "correct" gradients for gradient descent?

score 1 · Answer 1 · answered Jun 04 '17 at 15:43

I ended up testing it experimentally. The gradient calculation does take the assign op into account. I used the below code to test. Running it as it results in a positive gradient. Commenting out the weight assign op line and the control dependency lines results in a negative gradient. This is because the gradient is either being considered for the original starting value weight of 0.0 or of the updated weight after the assign of 2.0.

import tensorflow as tf

data = [[1.0], [2.0], [3.0]]
labels = [[1.0], [2.1], [2.9]]

input_data = tf.placeholder(dtype=tf.float32, shape=[3, 1])
input_labels = tf.placeholder(dtype=tf.float32, shape=[3, 1])
weights = tf.Variable(tf.constant([0.0]))
bias = tf.Variable(tf.constant([0.0]))
output = (weights * input_data) + bias
weight_assign_op = weights.assign(tf.constant([2.0]))
with tf.control_dependencies([weight_assign_op]):
    output = tf.identity(output)
loss = tf.reduce_sum(tf.norm(output - input_labels))
weight_gradient = tf.gradients(loss, weights)
initialize_op = tf.global_variables_initializer()

session = tf.Session()
session.run([initialize_op])
weight_gradient_value = session.run([weight_gradient], feed_dict={input_data: data, input_labels: labels})
print(weight_gradient_value)

TensorFlow - Gradients across post-processing assign ops?

1 Answers1

Linked