I've been working on a GAN for image synthesis and have been getting the same problem no matter how much I try to change my architecture/optimizers/losses, so I've gone to the mnist dataset to see if the problem stays the same.
From what I've seen online, GANs at low image sizes in greyscale can convert after just a few epochs but in my case the images below show what I'm achieving:
after 1 epoch 14 / 15 epochs in
Using the same training loop and basic setup (with different generators/discriminators) on RGB Cat images I'm getting even worse results after hundreds of epochs, where most of the outputs are streaks of white, R/G/B, C/M/Y, or black no matter in the input. (see below)
All-in-all, I think my training loop itself must be blame but I can't figure out where the issue is - below is the code I'm currently using (for the MNIST data)
dropout_rate = 0
descriminator = tf.keras.Sequential([
Input(shape=(28, 28, 1)),
Flatten(),
Dense(1024),
LeakyReLU(),
Dropout(dropout_rate),
Dense(256),
LeakyReLU(),
Dropout(dropout_rate),
Dense(64),
LeakyReLU(),
Dropout(dropout_rate),
Dense(1)
])
noise_shape = [100]
generator = tf.keras.Sequential([
Input(shape=noise_shape),
Dense(128),
LeakyReLU(),
Dense(256),
LeakyReLU(),
Dense(1024),
LeakyReLU(),
Dense(784),
Reshape((28, 28, 1))
])
descriminator.trainable = False
gan_input = Input(shape=noise_shape)
generator_output = generator(gan_input)
descriminator_output = descriminator(generator_output)
gan = tf.keras.Model(inputs=gan_input, outputs=descriminator_output)
descriminator.compile(optimizer=Adam(4e-4), loss='mse', metrics=['accuracy'])
gan.compile(optimizer=Adam(1e-4), loss='mse', metrics=['accuracy'])
descriminator_history = []
gan_history = []
test_noise = np.random.uniform(size=[1] + noise_shape)
generations = [generator.predict(test_noise, verbose=0)[0, :, :, :]]
epochs = 50
batch_size = 64
for epoch in range(epochs):
print('epoch:', epoch + 1)
images = tf.random.shuffle(images)
batches = images.shape[0] // batch_size
for batch in range(batches):
noise = np.random.uniform(size=[batch_size] + noise_shape)
generated_images = generator.predict(noise, verbose=0)
image_batch = images[batch * batch_size : (batch + 1) * batch_size, :, :, :]
x = np.concatenate([image_batch, generated_images])
y_descriminator = np.concatenate([np.ones(batch_size) * 0.99, np.ones(batch_size) * -1])
# train descriminator
descriminator.trainable = True
descriminator_hist = descriminator.train_on_batch(x, y_descriminator, return_dict=True)
descriminator.trainable = False
descriminator_history.append(descriminator_hist)
y_generator = np.ones(batch_size)
# train generator
gan_hist = gan.train_on_batch(noise, y_generator, return_dict=True)
gan_history.append(gan_hist)
generations.append(generator.predict(test_noise, verbose=0)[0, :, :, :])
I've tried a few methods for mode collapse and nothing seems to work, these include:
Different loss functions (mse, binary crossentropy, wasserstein loss)
optimizer settings (learning rates from 1e-3 to 1e-7, different between generator and discriminator, and clipnorm from 10 to 1e-2)
Normalization within the NNs, and different activations (tanh, sigmoid, relu, leakyrelu)
Kernel initialisation with HeNormal
Convolutional layers instead of dense
Any help would be much appreciated.