I am trying to train a GAN. Somehow the gradient for the generator returns None even though it returns gradients for the discriminator. This leads to ValueError: No gradients provided for any variable: ['carrier_freq:0'].
when the optimizer applies the gradients to the weights (in this case just a single weight and should be a single gradient). I can't seem to find the reason for that as the computation should be almost the same.
This is the code for the train step where the gradients of the generator return [None].
generator = make_generator()
discriminator = make_discriminator()
g_loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
g_optimizer = keras.optimizers.Adam(learning_rate=0.04)
d_loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
d_optimizer = keras.optimizers.Adam(learning_rate=0.03)
def train_step(train_set):
# modulate or don't modulate sample
for batch in train_set:
# get a random DEMAND noise sample to mix with speech
noise_indices = tf.random.uniform([batch_size], minval=0, maxval=len(demand_dataset), dtype=tf.int32)
# labels of 0 representing legit samples
legit_labels = tf.zeros(batch_size, dtype=tf.uint8)
# labels of 1 representing adversarial samples
adversarial_labels = tf.ones(batch_size, dtype=tf.uint8)
# concat legit and adversarial labels
concat_labels = tf.concat((legit_labels, adversarial_labels), axis=0)
# calculate gradients
with tf.GradientTape(persistent=True) as tape:
legit_predictions = discriminator(legit_path(batch, noise_indices))
adversarial_predictions = discriminator(adversarial_path(batch, noise_indices))
# concat legit and adversarial predictions to match double batch of concat_labels
d_predictions = tf.concat((legit_predictions, adversarial_predictions), axis=0)
d_loss = d_loss_fn(concat_labels, d_predictions)
g_loss = g_loss_fn(legit_labels, adversarial_predictions)
print('Discriminator loss: ' + str(d_loss))
print('Generator loss: ' + str(g_loss))
d_grads = tape.gradient(d_loss, discriminator.trainable_weights)
g_grads = tape.gradient(g_loss, generator.trainable_weights)
print(g_grads)
d_optimizer.apply_gradients(zip(d_grads, discriminator.trainable_weights))
g_optimizer.apply_gradients(zip(g_grads, generator.trainable_weights))
discriminator_loss(d_loss)
generator_loss(g_loss)
return d_loss, g_loss
Here some information about what happens there:
The discriminator's goal is distinguishing between legit and adversarial samples. The discriminator receives double the batch. Once the batch is preprocessed in a way that would be legit data and once again in a way that would produce adversarial data i.e. the data is passed through the generator and is modified there.
The generator only has a single weight right now and consists of addition and multiplication operations wrapped in lambda layers.
The losses are calculated as BinaryCrossentropy between the labels and data. The discriminator receives the true labels that represent whether or not each sample was modified. The generator loss is calculated similar but it only considers the samples that were modified and the labels that represent legit samples. So it basically measures how how many adversarial samples are classified as legit by the discriminator.
Now on to the problem:
Both loss calculation seem to work as they return a value. Also the calculation of gradients works for the discriminator. But the gradients of the generator return [None]
. It should work quite similar to the calculation of the discriminator gradients as the difference is that the loss calculation only uses a subset of the data that is used for the discriminator loss. Another thing is that the generator only has a single weight and consists of lambda layers doing multiplication and addition whereas the discriminator is a Dense net and has more than one weight.
Does anyone has an idea what the root of the problem could be?