0

I've been working on a GAN for image synthesis and have been getting the same problem no matter how much I try to change my architecture/optimizers/losses, so I've gone to the mnist dataset to see if the problem stays the same.

From what I've seen online, GANs at low image sizes in greyscale can convert after just a few epochs but in my case the images below show what I'm achieving:

after 1 epoch 14 / 15 epochs in

Using the same training loop and basic setup (with different generators/discriminators) on RGB Cat images I'm getting even worse results after hundreds of epochs, where most of the outputs are streaks of white, R/G/B, C/M/Y, or black no matter in the input. (see below)

for RGB image generation

All-in-all, I think my training loop itself must be blame but I can't figure out where the issue is - below is the code I'm currently using (for the MNIST data)

dropout_rate = 0
descriminator = tf.keras.Sequential([
  Input(shape=(28, 28, 1)),
  Flatten(),

  Dense(1024),
  LeakyReLU(),
  Dropout(dropout_rate),

  Dense(256),
  LeakyReLU(),
  Dropout(dropout_rate),
  
  Dense(64),
  LeakyReLU(),
  Dropout(dropout_rate),

  Dense(1)
])

noise_shape = [100]
generator = tf.keras.Sequential([
  Input(shape=noise_shape),

  Dense(128),
  LeakyReLU(),

  Dense(256),
  LeakyReLU(),

  Dense(1024),
  LeakyReLU(),

  Dense(784),
  Reshape((28, 28, 1))
  ])

descriminator.trainable = False
gan_input = Input(shape=noise_shape)
generator_output = generator(gan_input)
descriminator_output = descriminator(generator_output)
gan = tf.keras.Model(inputs=gan_input, outputs=descriminator_output)

descriminator.compile(optimizer=Adam(4e-4), loss='mse', metrics=['accuracy'])
gan.compile(optimizer=Adam(1e-4), loss='mse', metrics=['accuracy'])

descriminator_history = []
gan_history = []

test_noise = np.random.uniform(size=[1] + noise_shape)
generations = [generator.predict(test_noise, verbose=0)[0, :, :, :]]

epochs = 50
batch_size = 64

for epoch in range(epochs):
  print('epoch:', epoch + 1)

  images = tf.random.shuffle(images)

  batches = images.shape[0] // batch_size
  for batch in range(batches):
    noise = np.random.uniform(size=[batch_size] + noise_shape)
    generated_images = generator.predict(noise, verbose=0)
    image_batch = images[batch * batch_size : (batch + 1) * batch_size, :, :, :]
    x = np.concatenate([image_batch, generated_images])
    
    y_descriminator = np.concatenate([np.ones(batch_size) * 0.99, np.ones(batch_size) * -1])

    # train descriminator
    descriminator.trainable = True
    descriminator_hist = descriminator.train_on_batch(x, y_descriminator, return_dict=True)
    descriminator.trainable = False
    descriminator_history.append(descriminator_hist)

    y_generator = np.ones(batch_size)
    
    # train generator
    gan_hist = gan.train_on_batch(noise, y_generator, return_dict=True)
    gan_history.append(gan_hist)

  generations.append(generator.predict(test_noise, verbose=0)[0, :, :, :])

I've tried a few methods for mode collapse and nothing seems to work, these include:

Different loss functions (mse, binary crossentropy, wasserstein loss)

optimizer settings (learning rates from 1e-3 to 1e-7, different between generator and discriminator, and clipnorm from 10 to 1e-2)

Normalization within the NNs, and different activations (tanh, sigmoid, relu, leakyrelu)

Kernel initialisation with HeNormal

Convolutional layers instead of dense

Any help would be much appreciated.

0 Answers0