Why we can not optimize a Neural Network model with two cost function in series?

Question

I'm trying to implement a neural network in which I want optimized two cost function. Could you please let me know your thoughts about an approach that does the following:

for i in it 
    ...      
    min lose_1  // modified the weight matrix W 
    min loss_2  // modified the weight matrix W (the last matrix resulting by  minimized the loss_1)
end

So, I perform one iteration of backpropagation with cost function 1 and then one with the with cost function 2.

Thank you.

Once you've minimized loss 2, you're no longer at the minimum for loss 1. — Arya McCarthy, Feb 18 '18 at 16:20
The example ```min(f), then min(g)``` with ```f(x)=x``` and ```g(x)=-x``` should be quite self-explanatory. Of course things are much more complex in non-convex local-optimization (initial points). — sascha, Feb 18 '18 at 16:20
but when i do that, my algorithm converge (it minimizes cost1 and cost2), of course it's not the good solution for the function f1 and f2 but i think is the good solution for these 2 functions at the same time. how can I explain that? — yassine nasser, Feb 18 '18 at 16:34
Define convergence. Of course local-convergence will occur (under mild conditions; one time for f, then for g). But if your solution is a **joint**-optimum, you got a very very special problem (it does not transfer to even simple examples). You basically throw away everything the first opt does, except for initial-points. Now of course these points can already bound you to some very small neighborhood (not much change to loss1; not much flexibility to change loss2). This all depends on specifics. — sascha, Feb 18 '18 at 16:37

Imran · Answer 1 · 2018-02-18T17:57:29.923

1

It is common to have two objectives you want your model to optimize, and the typical solution is to combine them into a single cost function by taking a weighted sum:

C = a*C1 + b*C2

Where a and b are selected to make sure one term doesn't dominate.

This way you can compute a single gradient and use that to update your weights on each training step.

EDIT: If you are not getting good results then either 1. you are not weighting the components correctly and one is dominating the gradients, or 2. the functions simply aren't compatible and there is no good way to minimize both at the same time. @sascha gave a trivial example of this in the comments: C1 = x and C2 = -x.

EDIT 2: If you are updating your parameters with respect to the gradient of C1 and then with respect to the gradient of C2, then this is very similar to what I suggested, since the gradient of the sum is the sum of the gradients.

However if you are recomputing the gradient after the first step, then this could lead to unstable solutions, because the second step could be "undoing" the work of the first step or vice versa at every iteration, especially if the cost functions are not compatible. In my method you are more likely to arrive at a minimum. But both methods are very similar, and if you are getting problems it is likely due to one of the reasons I mentioned.

edited Feb 18 '18 at 17:57

answered Feb 18 '18 at 16:29

Imran

12,950
8
64
79

but when i do that, my algorithm converge (it minimizes cost1 and cost2), of course it's not the good solution for the function f1 and f2 but i think is the good solution for these 2 functions at the same time. how can I explain that? – yassine nasser Feb 18 '18 at 16:34
Sorry, I don't understand. What do you mean by "the right solution for the function f1 and f2"? Can you update your question with a concrete example? – Imran Feb 18 '18 at 16:37
There are a couple of possibilities: 1. You might not be weighting them correctly and the magnitude of one is dominating the gradients or 2. The two functions are not compatible and there is no good way to minimize both at the same time. – Imran Feb 18 '18 at 16:46
No sir, I'm talking about the minimization way that I posted in the question, when I print the loss value of each cost function I see that the loss of the cost function 1 and 2 decrease with iterations. – yassine nasser Feb 18 '18 at 17:05
Can you update your question to explain more about how you are minimizing them? It's really hard to understand what you are doing. Are you performing one iteration of backpropagation with one cost function and then one with the other? – Imran Feb 18 '18 at 17:09
Yes, I perform one iteration of backpropagation with cost function 1 and then one with the with cost function 2. – yassine nasser Feb 18 '18 at 18:42

Why we can not optimize a Neural Network model with two cost function in series?

1 Answers1