0

Recently I have learned about Generative Adversarial Networks.

For training the Generator, I am somehow confused how it learns. Here is an implemenation of GANs:

`# train generator
            z = Variable(xp.random.uniform(-1, 1, (batchsize, nz), dtype=np.float32))
            x = gen(z)
            yl = dis(x)
            L_gen = F.softmax_cross_entropy(yl, Variable(xp.zeros(batchsize, dtype=np.int32)))
            L_dis = F.softmax_cross_entropy(yl, Variable(xp.ones(batchsize, dtype=np.int32)))

        # train discriminator

        x2 = Variable(cuda.to_gpu(x2))
        yl2 = dis(x2)
        L_dis += F.softmax_cross_entropy(yl2, Variable(xp.zeros(batchsize, dtype=np.int32)))

        #print "forward done"

        o_gen.zero_grads()
        L_gen.backward()
        o_gen.update()

        o_dis.zero_grads()
        L_dis.backward()
        o_dis.update()`

So it computes a loss for the Generator as it is mentioned in the paper. However, it calls the Generator backward function based on the Discriminator output. The discriminator output is just a number (not an array).

But we know that in general, for training a network, we compute a loss function in the last layer (a loss between the last layers output and the real output) and then we compute the gradients. So for example, if the output is 64*64, then we compare it with a 64*64 image and then compute the loss and do the back propagation.

However, in the codes that I see in Generative Adversarial Networks, I see they compute a loss for the Generator from the discriminator output (which is just a number) and then they call the back propagation for Generator. The Generators last layers is for example 64*64 pixels but the discriminator loss is 1*1 (which is different from the usual networks) So I do not understand how it cause the Generator to be learned and trained?

I thought if we attach the two networks (attaching the Generator and Discriminator) and then call the back propagation but just update the Generators parameters, it makes sense and it should work. But what I see in the codes are totally different.

So I am asking how it is possible?

Thanks

Kadaj13
  • 1,423
  • 3
  • 17
  • 41
  • Your question is not very clear, but check whether this is helpful. The discriminator is a normal classifier, which takes image as input and classifies whether its fake or not fake. The real data comes from training set and fake data comes from Generator. So the discriminator is learned based on this two inputs. For the generator case, it has to fool the discriminator, so the output of generator is fed to the discriminator and the generator is learned by setting the output of discriminator as non-fake. Here only the generator is learned. – Vijay Mariappan Jun 27 '17 at 20:59
  • Thankyou. Sorry for my bad question. My question is just about the code. I understand the algorithm clearly. My question is: for training the Generator, we have to backpropagate the loss from Discriminator to Generator, but not updating the discriminators parametes. However, in the code, I just see they use the Discriminators output (the loss) and without backpropagating through the discriminator, they send it to Generator. what is my mistake here? – Kadaj13 Jun 30 '17 at 07:11
  • I am not completely sure about this, but I see your point. It makes sense to me if the back propagation does go through the discriminator (because we need to expand the size), however, the weight update is only applied to the generator portion of the network – Adam Jul 20 '17 at 02:24

1 Answers1

0

You say 'However, it calls the Generator backward function based on the Discriminator output. The discriminator output is just a number (not an array)' whereas the loss is always a scalar value. When we compute mean square error of two images it is also a scalar value.

L_adversarial = E[log(D(x))]+E[log(1−D(G(z))]

x is from real data distribution

z is the latent data distribution which is transformed by the Generator

Coming back to your actual question, The Discriminator network has a sigmoid activation function in the last layer which means it outputs in the range [0,1]. Discriminator tries to maximize this loss by maximizing both terms that are added in the loss function. Maximum value of first term is 0 and occurs when D(x) is 1 and maximum value of second term is also 0 and occurs when 1-D(G(z)) is 1 which means D(G(z)) is 0. So Discriminator tries to do a binary classification my maximizing this loss function where it tries to output 1 when it is fed x(real data) and 0 when it is fed G(z)(generated fake data). But the Generator tries to minimize this loss in other words it tries to fool the Discriminator by generating fake samples which are similar to real samples. With time both Generator and Discriminator gets better and better. This is the intuition behind GAN.

The code is in pytorch

bce_loss = nn.BCELoss() #bce_loss = -ylog(y_hat)-(1-y)log(1-y_hat)[similar to L_adversarial]

Discriminator = ..... #some network   
Generator = ..... #some network

optimizer_generator = ....... #some optimizer for generator network    
optimizer_discriminator = ....... #some optimizer for discriminator network       

z = ...... #some latent data distribution that is transformed by the generator
real = ..... #real data distribution

#####################
#Update Discriminator
#####################
fake = Generator(z)
fake_prediction = Discriminator(fake)
real_prediction = Discriminator(real)
discriminator_loss = bce_loss(fake_prediction,torch.zeros(batch_size))+bce_loss(real_prediction,torch.ones(batch_size))
discriminator_loss.backward()
optimizer_discriminator.step()

#################
#Update Generator
#################
fake = Generator(z)
fake_prediction = Discriminator(fake)
generator_loss = bce_loss(fake_prediction,torch.ones(batch_size))
generator_loss.backward()
optimizer_generator.step()