1

Is there a way to flip the effect of the cross-entropy loss?

I have a language model, and I want to train the model in a way that doesn't generate a specific text. Thus, I have two losses, one that I want to reduce (loss1) and another that I want to increase (loss2):

loss1 = outputs['loss1']
loss2 = 1-outputs['loss2']
loss = loss1 + loss2

My question is, is it correct to subtract loss2 from 1? in this way it increases instead of decreasing.

Minions
  • 5,104
  • 5
  • 50
  • 91
  • 3
    It should work, even directly with `loss1-loss2`. But you may want to check if any weighting factor is required for one of the losses or use separate optimizers in alternate iterations for the two losses. Maximizing CE loss is usually done in adversarial attacks, where it does work well. – akshayk07 Apr 27 '22 at 11:20
  • Thanks, @akshayk07! what do you mean by "check if any weighting factor is required"? – Minions Apr 27 '22 at 17:36
  • `loss1-loss2` means both losses get equal weightage, but you may want to consider using `loss1- w*loss2` where `w` is the weightage (a hyperparameter). It will depend on your use-case, evaluation method, and perhaps quality of labels. For example, if your true labels are high quality then equal weightage or even higher weightage for true label may be better. But if your true labels are noisy then maybe negative-labels (text you don't want to generate) could be given higher weightage. – akshayk07 Apr 27 '22 at 17:56
  • @akshayk07 you mean: `...could be given LOWER weightage.` right? – Minions Apr 27 '22 at 18:25
  • When true labels are noisy, negative-labels are still good quality maybe? that's why higher weightage for negative-labels. But I am not very experienced in NLP domain, so I'm not sure how these specific experiments pan out in practice. You may discount my opinion here and go with your experience. – akshayk07 Apr 28 '22 at 13:48
  • Ok, I see what you meant now. Thanks! – Minions Apr 28 '22 at 13:53

0 Answers0