0

I'm working on a regression problem in pytorch. My target values can be either between 0 to 100 or 0 to 1 (they represent % or % divided by 100).

The data is unbalanced, I have much more data with lower targets.

I've noticed that when I run the model with targets in the range 0-100, it doesn't learn - the validation loss doesn't improve, and the loss on the 25% large targets is very big, much bigger than the std in this group.

However, when I run the model with targets in the range 0-1, it does learn and I get good results.

If anyone can explain why this happens, and if using the ranges 0-1 is "cheating", that will be great.

Also - should I scale the targets? (either if I use the larger or the smaller range).

Some additional info - I'm trying to fine tune bert for a specific task. I use MSEloss.

Thanks!

1 Answers1

0

I think your observation relates to batch normalization. There is a paper written on the subject, an numerous medium/towardsdatascience posts, which i will not list here. Idea is that if you have a no non-linearities in your model and loss function, it doesn't matter. But even in MSE you do have non-linearity, which makes it sensitive to scaling of both target and source data. You can experiment with inserting Batch Normalization Layers into your models, after dense or convolutional layers. In my experience it often improves accuracy.

Aramakus
  • 1,910
  • 2
  • 11
  • 22