tf.Variable assign method breaks the tf.GradientTape

Question

When I use the assign method of tf.Variable to change the value of a variable, it brakes the tf.Gradient, e. g., see the code for a toy example below:

(NOTE: I am interested in TensorFlow 2 only.)

x = tf.Variable([[2.0,3.0,4.0], [1.,10.,100.]])
patch = tf.Variable([[0., 1.], [2., 3.]])
with tf.GradientTape() as g:
    g.watch(patch)
    x[:2,:2].assign(patch)
    y = tf.tensordot(x, tf.transpose(x), axes=1)
    o = tf.reduce_mean(y)
do_dpatch = g.gradient(o, patch)

Then it gives me None for the do_dpatch.

Note that if I do the following it works perfectly fine:

x = tf.Variable([[2.0,3.0,4.0], [1.,10.,100.]])
patch = tf.Variable([[0., 1.], [2., 3.]])
with tf.GradientTape() as g:
    g.watch(patch)
    x[:2,:2].assign(patch)
    y = tf.tensordot(x, tf.transpose(x), axes=1)
    o = tf.reduce_mean(y)
do_dx = g.gradient(o, x)

and gives me:

>>>do_dx 
<tf.Tensor: id=106, shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  2., 52.],
       [ 1.,  2., 52.]], dtype=float32)>

score 0 · Answer 1 · answered Dec 21 '19 at 07:42

This behavior does make sense. Let's take your first example

x = tf.Variable([[2.0,3.0,4.0], [1.,10.,100.]])
patch = tf.Variable([[1., 1.], [1., 1.]])
with tf.GradientTape() as g:
    g.watch(patch)
    x[:2,:2].assign(patch)
    y = tf.tensordot(x, tf.transpose(x), axes=1)
dy_dx = g.gradient(y, patch)

You are computing dy/d(patch). But your y depends on x only not on patch. Yes, you do assign values to x from patch. But this operation doesn't carry a reference to the patch Variable. It just copies the values.

In short, you are trying to get a gradient w.r.t something it doesn't depend on. So you will get None.

Let's look at the second example and why it works.

x = tf.Variable([[2.0,3.0,4.0], [1.,10.,100.]])
with tf.GradientTape() as g:
    g.watch(x)
    x[:2,:2].assign([[1., 1.], [1., 1.]])
  y = tf.tensordot(x, tf.transpose(x), axes=1)
dy_dx = g.gradient(y, x)

This example is perfectly fine. Y depends on x and you are computing dy/dx. So you'd get actual gradients in this example.

score 0 · Accepted Answer · answered Jan 02 '20 at 11:36

As explained HERE (see the quote below from alextp) tf.assign does not support gradient.

"There is no plan to add a gradient to tf.assign because it's not possible in general to connect the uses of the assigned variable with the graph which assigned it."

So, the above problem can be resolved by the following code:

x= tf.Variable([[0.0,0.0,4.0], [0.,0.,100.]])
patch = tf.Variable([[0., 1.], [2., 3.]])
with tf.GradientTape() as g:
    g.watch(patch)
    padding = tf.constant([[0, 0], [0, 1]])
    padde_patch = tf.pad(patch, padding, mode='CONSTANT', constant_values=0)
    revised_x = x+ padde_patch
    y = tf.tensordot(revised_x, tf.transpose(revised_x), axes=1)
    o = tf.reduce_mean(y)
do_dpatch = g.gradient(o, patch)

which results in

do_dpatch

<tf.Tensor: id=65, shape=(2, 2), dtype=float32, numpy=
array([[1., 2.],
       [1., 2.]], dtype=float32)>

tf.Variable assign method breaks the tf.GradientTape

2 Answers2

Linked