Is it possible to acquire an intermediate gradient? (Tensorflow)

Question

When using gradient tape you can calculate the gradient after using:

with tf.GradientTape() as tape:
        out = model(x, training=True)
        out = tf.reshape(out, (num_img, 1, 10)) # Resizing 
        loss = tf.keras.losses.categorical_crossentropy(y, out) 
        gradient = tape.gradient(loss, model.trainable_variables)

However, this returns the, in the case of the cifar10 inputs, gradients of the input images. Is there a way to access the gradients of an intermediate step, such that they have been through "some" training?

If you can access the intermediate variables from the model, e.g. just calling the first layer instead of the whole model, you can compute the gradients, there. `GradientTape` in general, can be applied to most of Tensorflow calculations, you just need to access the variables. — PythonF, Feb 09 '21 at 14:34
@FabianZills how could I calculate the loss without the proper output labels? Just assume the output should be the same labels as with the original code? They won't have the same size if so, right? — , Feb 09 '21 at 14:59

PythonF · Accepted Answer · 2021-02-10T09:03:06.503

EDIT: Thanks to your comment I got a better understanding of your problem. The following code is far from ideal and does not take into consideration batch training, etc. but it might give you a good starting point. I wrote a custom training step which basically substitutes the model.fit method. There might be better methods to do this, but it should give you a quick comparison of gradients.

def custom_training(model, data):
    x, y = data
    # Training 
    with tf.GradientTape() as tape:
        y_pred = model(x, training=True)  # Forward pass
        # Compute the loss value
        # (the loss function is configured in `compile()`)
        loss = tf.keras.losses.mse(y, y_pred)
        
    trainable_vars = model.trainable_variables
    gradients = tape.gradient(loss, trainable_vars)
    tf.keras.optimizers.Adam().apply_gradients(zip(gradients, trainable_vars))
    # computing the gradient without optimizing it!
    with tf.GradientTape() as tape:
        y_pred = model(x, training=False)  # Forward pass
        # Compute the loss value
        # (the loss function is configured in `compile()`)
        loss = tf.keras.losses.mse(y, y_pred)
    trainable_vars = model.trainable_variables
    gradients_plus = tape.gradient(loss, trainable_vars)
    
    return gradients, gradients_plus

Let us assume a very simple model:

import tensorflow as tf

train_data = tf.random.normal((1000, 32))
train_features = tf.random.normal((1000,))

inputs = tf.keras.layers.Input(shape=(32))
hidden_1 = tf.keras.layers.Dense(32)(inputs)
hidden_2 = tf.keras.layers.Dense(32)(hidden_1)
outputs = tf.keras.layers.Dense(1)(hidden_2)

model = tf.keras.Model(inputs, outputs)

And you want to compute the gradients of all layers with respect to the inputs. You can use the following:

with tf.GradientTape(persistent=True) as tape:
    tape.watch(inputs)
    out_intermediate = []
    inputs = train_data
    cargo = model.layers[0](inputs)
    for layer in model.layers[1:]:
        cargo = layer(cargo)
        out_intermediate.append(cargo)
        
for x in out_intermediate:
    print(tape.gradient(x, inputs))

If you want to compute a custom loss I recommend Customize what happens in Model.fit

It appears that using cargo with mode.trainable parameters returns None values, where as the original code doesn't. It isn't possible to compare the gradients when utilizing a different optimizer when returning None. Is there some other way to solve for the gradient after running through a stage of the optimizer? — , Feb 09 '21 at 18:43
You may have to add `tape.watch(model.trainable_variables)` or something similar to the tape. Basically you have to make sure, that everything you want to compute gradients on, is beeing run inside the GradientTape But I am not entirely sure, if I understand your issue properly. — PythonF, Feb 09 '21 at 21:12
I want to compare gradients, after at least one update, between different optimizers. Basically I want Xt+1 gradients, where my code returned Xt gradients. — , Feb 09 '21 at 21:20
Thank you for the additional information. To confirm what I am thinking, the first function call returns the Xt gradients and the second is the Xt+1 gradients? I believe so I was just confused by the comment above the second call. — , Feb 10 '21 at 13:30
Also, would the model not revert to the "original" standing before the second gradient tape call? — , Feb 10 '21 at 15:41
Yes, the model is in the state Xt, after one iteraton. not in the state Xt+1. But you get the gradient of Xt and Xt+1. And yes, it returns the gradients [Xt, Xt+1] — PythonF, Feb 11 '21 at 09:19
Is it correct to assume that tape.gradient computes the gradients for each layer of the model? I noticed that the shape of the gradients changes as you look loop through and look at them, and I believe this may be due to the layers. — , Feb 12 '21 at 15:42
If you compute the gradient with respect to `trainable_variables` it computes the gradients of all layers, if you want to call it that way. — PythonF, Feb 14 '21 at 15:42

Is it possible to acquire an intermediate gradient? (Tensorflow)

1 Answers1