Problem computing partial derivatives with GradientTape() in TensorFlow2

Question

i have problems in the computation of gradients using automatic differentiation in TensorFlow. Basically i want to create a neural network which has just one output-value f and get an input of two values (x,t). The network should act like a mathematical function, so in this case f(x,t) where x and t are the input-variables and i want to compute partial derivatives, for example df_dx, d2f/dx2 or df_dt. I need those partial derivatives later for a specific loss-function. Here is my simplified code:

import numpy as np
import tensorflow as tf 
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model


class MyModel(Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.flatten = Flatten(input_shape=(2, 1))
        self.d1 = Dense(28)
        self.f = Dense(1)

    def call(self, y):
        y = self.flatten(y)
        y = self.d1(y)
        y = self.f(y)
        return y

if __name__ == "__main__":

    #inp contains the input-variables (x,t)
    inp = np.random.rand(1,2,1)
    inp_tf = tf.convert_to_tensor(inp, np.float32)   

    #Create a Model
    model = MyModel()

    #Here comes the important part:
    x = inp_tf[0][0]
    t = inp_tf[0][1]

    with tf.GradientTape(persistent=True) as tape:
        tape.watch(inp_tf[0][0])
        tape.watch(inp_tf)
        f = model(inp_tf)

    df_dx = tape.gradient(f, inp_tf[0][0])  #Derivative df_dx
    grad_f = tape.gradient(f, inp_tf)

    tf.print(f)         #--> [[-0.0968768075]]
    tf.print(df_dx)     #--> None
    tf.print(grad_f)    #--> [[[0.284864038]
                        #      [-0.243642956]]]

What i expected was that i get df_dx = [0.284864038] (the first component of grad_f), but it results in None. My questions are:

Is it possible to get partial derivatives of f to only one input-variable?
If yes: What i have to change in my code that the computation df_dx doesn't result None?

What i think could do is to modify the architecture of the class MyModel that i use two different Inputlayer (one for x and one for t) so that i can call the model like f = model(x,t) but that seems unnatural for me and i think there should be an easier way.

Another point is that i don't get an Error when i change the input_shape of the Flattenlayer for example to self.flatten = Flatten(input_shape=(5,1) but my inputvector has shape(1,2,1), so i expect to get an error but that's not the case, why? I'm grateful for your help :)

I use the following configurations:

Visual Studio Code with Python-Extension as IDE
Python-Version: 3.7.6
TensorFlow-Version: 2.1.0
Keras-Version: 2.2.4-tf

score 3 · Accepted Answer · answered Apr 24 '20 at 14:45

Each time you do inp_tf[0][0] or inp_tf[0][1] you are creating a new tensor, but that new tensor is not used as input to your model, inp_tf is. Even if inp_tf[0][0] if part of inp_tf, from the point of view of TensorFlow there is no computation graph between your newly created inp_tf[0][0] and f, hence there is no gradient. You have to compute the gradient with respect to inp_tf and then take the parts of the gradient that you want from there.

In addition to that, as shown in the documentation of tf.GradientTape, you can use nested tapes to compute second order derivatives. And, if you use the jacobian, you can avoid using persistent=True, which is better for performance. Here is how it could work in your example (I changed the layer activation functions to sigmoid, as the default linear activation would not have a second order derivative).

import numpy as np
import tensorflow as tf 
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model

class MyModel(Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.flatten = Flatten(input_shape=(2, 1))
        self.d1 = Dense(28, activation='sigmoid')
        self.f = Dense(1, activation='sigmoid')

    def call(self, y):
        y = self.flatten(y)
        y = self.d1(y)
        y = self.f(y)
        return y

np.random.seed(0)
inp = np.random.rand(1, 2, 1)
inp_tf = tf.convert_to_tensor(inp, np.float32)
model = MyModel()
with tf.GradientTape() as tape:
    tape.watch(inp_tf)
    with tf.GradientTape() as tape2:
        tape2.watch(inp_tf)
        f = model(inp_tf)
    grad_f = tape2.gradient(f, inp_tf)
    df_dx = grad_f[0, 0]
    df_dt = grad_f[0, 1]
j = tape.jacobian(grad_f, inp_tf)
d2f_dx2 = j[0, 0, :, 0, 0]
d2f_dyx = j[0, 0, :, 0, 1]
d2f_dy2 = j[0, 1, :, 0, 1]
d2f_dxy = j[0, 1, :, 0, 0]

tf.print(df_dx)
# [0.0104712956]
tf.print(df_dt)
# [-0.00301733566]
tf.print(d2f_dx2)
# [[-0.000243180315]]
tf.print(d2f_dyx)
# [[-0.000740956515]]
tf.print(d2f_dy2)
# [[1.49392872e-05]]
tf.print(d2f_dxy)
# [[-0.000740956573]]

Problem computing partial derivatives with GradientTape() in TensorFlow2

1 Answers1