10

I have a generative adversarial networks, where the discriminator gets minimized with the MSE and the generator should get maximized. Because both are opponents who pursue the opposite goal.

generator = Sequential()
generator.add(Dense(units=50, activation='sigmoid', input_shape=(15,)))
generator.add(Dense(units=1, activation='sigmoid'))
generator.compile(loss='mse', optimizer='adam')

generator.train_on_batch(x_data, y_data)

What do I have to adapt, to get a generator model which profits from a high MSE value?

Emma
  • 439
  • 5
  • 17
  • 1
    Why do you want that? This is an ill-posed problem. Maximizing the MSE means you need to make your prediction go to the boundaries of the underlying data type. But if you really want to do that, supplying a negative learning rate for the optimizer should probably do the job. Or use the inverse of MSE as a loss function. – a_guest Dec 12 '19 at 11:56
  • 1
    I have a generative adversarial networks, where the discriminator gets minimized with the MSE and the generator should get maximized. Because both are opponents who pursue the opposite goal. – Emma Dec 12 '19 at 12:03
  • Ok your question was quite misleading. Please update it to be clear. – Geeocode Dec 12 '19 at 12:11
  • @Geeocode I did, thank you. Do you think the solution from Marco with the negative sign is correct? – Emma Dec 12 '19 at 12:16
  • See my update in minutes – Geeocode Dec 12 '19 at 12:29
  • @Geeocode I will, thank you – Emma Dec 12 '19 at 12:30
  • Tested the two answers, Emma? – Geeocode Dec 12 '19 at 13:03
  • @Geeocode Yes, now I tested both. Both work, but after some research, I almost think that Mano's solution is cleaner. What do you think? – Emma Dec 12 '19 at 13:52
  • My first thought was the negating solution as I wrote in my answer's last section, but after some research and after a real life test I got to the opinion that it will not work as we expect, because the updates will blow up when the original loss will grow (negated will decrease) and your original intention was the opposite. See the discussion: https://discuss.pytorch.org/t/what-happens-when-loss-are-negative/47883/3 – Geeocode Dec 12 '19 at 14:27
  • @Emma Usually with GANs you want to maximize/minimize probabilities hence using something like binary cross entropy loss. See for example [this tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html#loss-functions-and-optimizers). But if MSE and its inverse worked for you as well, that's interesting to read. – a_guest Dec 12 '19 at 14:39
  • @Emma see my update2 below. – Geeocode Dec 13 '19 at 13:15
  • @Emma My pleasure! – Geeocode Dec 14 '19 at 10:07
  • @Emma so what's the purpose of seeking another opinion? Did you get bad/sub-optimal results from the approaches below? Also, can you be more specific on what you're looking for (e.g. another loss function, any practical issues you're facing, etc.)? – thushv89 Dec 16 '19 at 10:45
  • @thushv89 I would like a new opinion to the two options for loss functions that Geeocode provided. Which one would you use? And why? – Emma Dec 16 '19 at 14:41
  • @Emma The problem is that SO restricts and sometimes close opinion based answers, that is why I gave you resulted FACTS about the two approach. Though I try to get some additional info yet. – Geeocode Dec 19 '19 at 11:03
  • It seems like you should review the concepts a bit more. Check out the following: https://pathmind.com/wiki/generative-adversarial-network-gan – lbragile Dec 22 '19 at 06:00
  • See my update below – Geeocode Dec 22 '19 at 21:11
  • @Emma See my Additional details scetion below. – Geeocode Dec 23 '19 at 12:43

2 Answers2

5

UPDATE:

The original MSE implementation looks like as follows:

def mean_squared_error(y_true, y_pred):
    if not K.is_tensor(y_pred):
        y_pred = K.constant(y_pred)
    y_true = K.cast(y_true, y_pred.dtype)
    return K.mean(K.square(y_pred - y_true), axis=-1)

I think the correct maximizer loss function:

def mean_squared_error_max(y_true, y_pred):
    if not K.is_tensor(y_pred):
        y_pred = K.constant(y_pred)
    y_true = K.cast(y_true, y_pred.dtype)
    return K.mean(K.square(1 / (y_pred - y_true)), axis=-1)

This way we get always a positive loss value, like in the case of the MSE function, but with reversed effect.

UPDATE 2: Initially I wrote, that the intuitive first thought to simply negate the loss will NOT give the result what we expected because of the base concept of the optimizing methods (you can read an interesting discussion here). After I double checked both method head to head the result in a particular learning task (Note: I didn't do an all-out test) was that both method gave the loss maximization, though the -loss approach converged a bit faster. I am not sure if it always gives the best solution or any solution because of the possible issue described here. If someone has other experience, please let me know.

So if somebody want to give a try to -loss too:

def mean_squared_error(y_true, y_pred):
    if not K.is_tensor(y_pred):
        y_pred = K.constant(y_pred)
    y_true = K.cast(y_true, y_pred.dtype)
    return - K.mean(K.square(y_pred - y_true), axis=-1)


Additional details:

OP wrote:

I have a generative adversarial networks, where the discriminator gets minimized with the MSE and the generator should get maximized. Because both are opponents who pursue the opposite goal.

From the link provided by Ibragil:

Meanwhile, the generator is creating new, synthetic images that it passes to the discriminator. It does so in the hopes that they, too, will be deemed authentic, even though they are fake. The goal of the generator is to generate passable hand-written digits: to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.


So this is an ill-posed problem:

In GAN our final goal to train our two counter-parties the discriminator and the generator to perform as good as possible against each other. It means, that the two base learning algorythm have different tasks but the loss function with which they can achieve the optimal solution is the same i.e. binary_crossentropy, so the models' tasks are to minimize this lost.

A discriminator model's compile method:

self.discriminator.compile(loss='binary_crossentropy', optimizer=optimizer)

A generator model's compile method:

self.generator.compile(loss='binary_crossentropy', optimizer=optimizer)

It is the same like two runner's goal to be minimized their time of reaching the finish even so they are competitors in this task.

So the "opposite goal" doesn't mean opposite task i.e. minimizing the loss (i.e. minimizing the time in the runner example).

I hope it helps.

Geeocode
  • 5,705
  • 3
  • 20
  • 34
5

The question is not very clear to me. I suppose you want to maximize instead of minimize, while using the criterion of the MSE.

You can implement your own custom loss function, which computes the -MSE ; flipping the sign of the loss, and thus achieving a flip in the gradient descent direction.

def negative_mse(y,yhat): 
    return - K.mean(K.sum(K.square(y-yhat)))

model.compile(loss=negative_mse, optimizer='adam')

Another option is to simply supply a negative learning step - but I'm not sure that Keras allows you to do this. Worth a try.

Mano
  • 797
  • 3
  • 17