Problem about getting None from the GradientTape.gradient in TensorFlow

Question

I tried the following code:

from d2l import tensorflow as d2l
import tensorflow as tf

@tf.function
def corr2d(X, k, Y):  #@save
    """Compute 2D cross-correlation."""
    with tf.GradientTape() as tape:
        for i in range(Y.shape[0]):
            for j in range(Y.shape[1]):
                Y[i, j].assign(tf.reduce_sum(tf.multiply(X[i: i + h, j: j + w], k)))
    print('Gradients = ', tape.gradient(Y, k)) # show the gradient
    print('Watched Variables = ', tape.watched_variables()) # show the watched varaibles

print(tf.__version__)
Xin= tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
kernel = tf.Variable([[0.0, 1.0], [2.0, 3.0]])
h, w = kernel.shape
Y_hat = tf.Variable(tf.zeros((Xin.shape[0] - h + 1, Xin.shape[1] - w + 1))) # prepare the output tensor
corr2d(X, kernel, Y_hat)
print(Y_hat)

I got the following results:

2.4.1
Gradients =  None
Watched Variables =  (<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32>, <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32>)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
       [37., 43.]], dtype=float32)>

Can anyone explain why the returned gradient is None even though the source variable kernel is included in the list of watched variables?

score 0 · Answer 1 · answered Feb 17 '21 at 06:12

I'm not sure I really understood what you were trying to do. You were passing your variable as the target for the gradient.

It is always easier to think in terms of cost function and variables. Let's say your cost function is y = x ** 2. In this case, it is possible to calculate the gradient of y with respect to x.

Basically, you did not have a function to calculate any gradient with respect to k.

I have done a small change. Check for the variable cost.

import tensorflow as tf

def corr2d(X, k, Y):  #@save
    """Compute 2D cross-correlation."""
    with tf.GradientTape() as tape:
        cost = 0
        for i in range(Y.shape[0]):
            for j in range(Y.shape[1]):
                Y[i, j].assign(tf.reduce_sum(tf.multiply(X[i: i + h, j: j + w], k)))
                cost = cost + tf.reduce_sum(tf.multiply(X[i: i + h, j: j + w], k))
    print('\nGradients = ', tape.gradient(cost, k)) # show the gradient
    print('Watched Variables = ', tape.watched_variables()) # show the watched varaibles

print(tf.__version__)
Xin= tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
kernel = tf.Variable([[0.0, 1.0], [2.0, 3.0]])
h, w = kernel.shape
Y_hat = tf.Variable(tf.zeros((Xin.shape[0] - h + 1, Xin.shape[1] - w + 1))) # prepare the output tensor
corr2d(Xin, kernel, Y_hat)
print(Y_hat)

And now, you will get

Gradients =  tf.Tensor(
[[ 8. 12.]
 [20. 24.]], shape=(2, 2), dtype=float32)
Watched Variables =  (<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[0., 1.],
       [2., 3.]], dtype=float32)>, <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
       [37., 43.]], dtype=float32)>)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
       [37., 43.]], dtype=float32)>

Actually, I need the `Y` to compute an MSE loss after returning from the `corr2d()`. However, I get `None` when computing the gradient of the MSE w.r.t. the trainable variable k, So, I try the posted code to check if the backpropagated gradient is terminated somewhere in the `corr2d()`. It seems that the indexing operation of tensors stops the propagation. I'm still trying other ways to implement the `corr2d()` without using the indexing operation. — C Chiang, Feb 17 '21 at 16:41

Problem about getting None from the GradientTape.gradient in TensorFlow

I tried the following code:

1 Answers1