How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager mode

Question

I am creating a basic auto-encoder for the MNIST dataset using TensorFlow eager mode. I would like to observe the second-order partial derivatives of my loss function with respect to the parameters of the network as it trains. Currently, calling tape.gradient() on the output of in_tape.gradient returns None (where in_tape is a GradientTape nested inside the outer GradientTape called tape, I have included my code below)

I have tried calling the tape.gradient() directly on the in_tape.gradient() with None being returned. My next approach was to iterate over the output of in_tape.gradient() and apply tape.gradient() to each gradient individually (with respect to my model variables) with None being returned each time.

I receive a single None value for any tape.gradient() call, not a list of None values which I believe would indicate None for a single partial derivative, which would be expected in some cases.

I am currently only trying to get the second derivatives for the first set of weights (from input to hidden layers), however, I will scale it to include all weights once I have this working.

tf.enable_eager_execution()

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((train_images.shape[0], train_images.shape[1]*train_images.shape[2])).astype(np.float32)/255
test_images = test_images.reshape((test_images.shape[0], test_images.shape[1]*test_images.shape[2])).astype(np.float32)/255

num_epochs = 200
batch_size = 100
learning_rate = 0.0003

class MNISTModel(tf.keras.Model):
    def __init__(self, device='/gpu:0'):
        super(MNISTModel, self).__init__()
        self.device = device
        self.initializer = tf.initializers.random_uniform(0.0, 0.5)
        self.hidden = tf.keras.layers.Dense(200, use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Hidden")
        self.out = tf.keras.layers.Dense(train_images.shape[1], use_bias=False, kernel_initializer=tf.initializers.random_uniform(0.0, 0.5), name="Output")
        self.hidden.build(train_images.shape[1])
        self.out.build(200)

    def call(self, x):
        return self.out(self.hidden(x))

def loss_func(model, x, y_):
    return tf.reduce_mean(tf.losses.mean_squared_error(labels=y_, predictions=model(x)))
    #return tf.reduce_mean((y_ - model(x))**4)

model = MNISTModel()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

for epochs in range(num_epochs):
    print("Started epoch ", epochs)
    print("Num batches is: ", train_images.shape[0]/batch_size)
    for i in range(0,1): #(int(train_images.shape[0]/batch_size)):
        with tfe.GradientTape(persistent=True) as tape:
            tape.watch(model.variables)
            with tfe.GradientTape() as in_tape:
                in_tape.watch(model.variables)
                loss = loss_func(model,train_images[0:batch_size],train_images[0:batch_size])
        grads = tape.gradient(loss, model.variables)
        IH_partial_grads = np.array([]) 
        for i in range(len(grads[0])):
            collector = np.array([])
            for j in range(len(grads[0][i])):
                collector = np.append(collector, tape.gradient(grads[0][i][j], model.variables[0]))
            IH_partial_grads = np.append(IH_partial_grads, collector)
        optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
    print("Epoch test loss: ", loss_func(model, test_images, test_images))

My ultimate goal is to form the hessian matrix for the loss function with respect to all parameters of my network.

Thanks for any and all help!

I can't reproduce your error. Could you please specify how exactly you're getting `None`? — Sharky, Mar 25 '19 at 20:40
Hi @Sharky, thanks for having a look! I'm not sure what else to say, at the end of running the above code `IH_partial_grads` is `None`. So when you run the above code do you obtain an `IH_partial_grads` matrix which has dimensions 145600x145600? — Devon Jarvis, Mar 26 '19 at 21:30

How to obtain second derivatives of a Loss function with respect to the parameters of a neural network using gradient tape in Tensorflow eager mode

0 Answers0