Custom GAN training loop using tf.GradientTape returns [None] as gradients for generator while it works for discriminator

Question

I am trying to train a GAN. Somehow the gradient for the generator returns None even though it returns gradients for the discriminator. This leads to ValueError: No gradients provided for any variable: ['carrier_freq:0']. when the optimizer applies the gradients to the weights (in this case just a single weight and should be a single gradient). I can't seem to find the reason for that as the computation should be almost the same.

This is the code for the train step where the gradients of the generator return [None].

generator = make_generator()
discriminator = make_discriminator()

g_loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
g_optimizer = keras.optimizers.Adam(learning_rate=0.04)
d_loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
d_optimizer = keras.optimizers.Adam(learning_rate=0.03)

def train_step(train_set):
    # modulate or don't modulate sample
    for batch in train_set:
        # get a random DEMAND noise sample to mix with speech
        noise_indices = tf.random.uniform([batch_size], minval=0, maxval=len(demand_dataset), dtype=tf.int32)
        
        # labels of 0 representing legit samples
        legit_labels = tf.zeros(batch_size, dtype=tf.uint8)
        # labels of 1 representing adversarial samples
        adversarial_labels = tf.ones(batch_size, dtype=tf.uint8)
        # concat legit and adversarial labels
        concat_labels = tf.concat((legit_labels, adversarial_labels), axis=0)
        
        # calculate gradients
        with tf.GradientTape(persistent=True) as tape:
            legit_predictions = discriminator(legit_path(batch, noise_indices))
            adversarial_predictions = discriminator(adversarial_path(batch, noise_indices))
            # concat legit and adversarial predictions to match double batch of concat_labels
            d_predictions = tf.concat((legit_predictions, adversarial_predictions), axis=0)
            d_loss = d_loss_fn(concat_labels, d_predictions)
            g_loss = g_loss_fn(legit_labels, adversarial_predictions)
            print('Discriminator loss: ' + str(d_loss))
            print('Generator loss: ' + str(g_loss))
        d_grads = tape.gradient(d_loss, discriminator.trainable_weights)
        g_grads = tape.gradient(g_loss, generator.trainable_weights)
        print(g_grads)
        d_optimizer.apply_gradients(zip(d_grads, discriminator.trainable_weights))
        g_optimizer.apply_gradients(zip(g_grads, generator.trainable_weights))
        
        discriminator_loss(d_loss)
        generator_loss(g_loss)
    return d_loss, g_loss

Here some information about what happens there:
The discriminator's goal is distinguishing between legit and adversarial samples. The discriminator receives double the batch. Once the batch is preprocessed in a way that would be legit data and once again in a way that would produce adversarial data i.e. the data is passed through the generator and is modified there.
The generator only has a single weight right now and consists of addition and multiplication operations wrapped in lambda layers.
The losses are calculated as BinaryCrossentropy between the labels and data. The discriminator receives the true labels that represent whether or not each sample was modified. The generator loss is calculated similar but it only considers the samples that were modified and the labels that represent legit samples. So it basically measures how how many adversarial samples are classified as legit by the discriminator.

Now on to the problem:
Both loss calculation seem to work as they return a value. Also the calculation of gradients works for the discriminator. But the gradients of the generator return [None]. It should work quite similar to the calculation of the discriminator gradients as the difference is that the loss calculation only uses a subset of the data that is used for the discriminator loss. Another thing is that the generator only has a single weight and consists of lambda layers doing multiplication and addition whereas the discriminator is a Dense net and has more than one weight.

Does anyone has an idea what the root of the problem could be?

score 1 · Answer 1 · answered Aug 21 '22 at 05:35

I think this is because you have not called the generator inside the GradientTape(). As discriminator has been called twice (within with tf.GradientTape(persistent=True) as tape:, you should call generator as well. (say such as generator(noise, training=True). This way, generator's gradient will be evaluated as well.

score 0 · Answer 2 · answered Jul 26 '21 at 14:36

I found the problem and a solution. The problem couldn't be taken from the code provided in the question, still I want to write about the solution for the improbable event of someone having the same issue and background situation.

Problem: Generator weight was part of a tf.keras.layers.Lambda() layer. Suchs weight or variables will not be tracked for gradient calculation. More information here: https://programming.vip/docs/day-6-tensorflow2-model-subclassing-api.html .
Solution: Write custom layer inheriting the base layer class like linked above.

Custom GAN training loop using tf.GradientTape returns [None] as gradients for generator while it works for discriminator

2 Answers2