PyTorch - Superior Model Performance by Misusing Loss Function (Negative Log Likelihood)?

Asked Jun 20 '20 at 17:54

Active Jun 20 '20 at 17:54

Viewed 45 times

I misread PyTorch's NLLLoss() and accidentally passed my model's probabilities to the loss function instead of my model's log probabilities, which is what the function expects. However, when I train a model under this misused loss function, the model (a) learns faster, (b) learns more stably, (b) reaches a lower loss, and (d) performs better at the classification task.

I don't have a minimal working example, but I'm curious if anyone else has experienced this or knows why this is? Any possible hypotheses?

One hypothesis I have is that the gradient with respect to the misused loss function is more stable because the derivative isn't scaled by 1/model output probability.

asked Jun 20 '20 at 17:54

Rylan Schaeffer

1,945
2
28
50

PyTorch - Superior Model Performance by Misusing Loss Function (Negative Log Likelihood)?

0 Answers0