I am optimizing over two loss functions which take very different values. To give an example:
loss1 = 1534
loss2 = 0.723
and I want to optimize over loss1+loss2
. Would rescaling loss1 to values closer to loss2 be a good idea? I tried the naive way of just multiplying loss2 by 1000, within the overall loss term (sum), but the problem is, as loss1
goes down (say around 600, 500) , loss2
becomes too large.
My idea is to find a way to keep both loss terms in the same range, during the whole optimization process. What is the best way of doing this?