Training Loss and Validation Loss in Deep Learning

Question

Would you please guide me how to interpret the following results?

1) loss < validation_loss 2) loss > validation_loss

It seems that the training loss always should be less than validation loss. But, both of these cases happen when training a model.

What have you tried to achieve your wanted results? What has your research concerning your problem shown? Can you provide code of your tries? [How do I ask a good question](https://stackoverflow.com/help/how-to-ask), [How much research effort is expected](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users) and [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) might be helpful to improve your question. — Geshode, Jan 12 '18 at 12:15
[C2W1L02](https://www.youtube.com/watch?v=SjQyLhQIXSM) and [Diagnosing Bias vs Variance](https://www.youtube.com/watch?v=ewogYw5oCAI) might help you too. — Hossein Kashiani, Jan 12 '18 at 13:17

prosti · Answer 1 · 2021-02-03T12:24:16.983

Really a fundamental question in machine learning.

If validation loss >> training loss you can call it overfitting.
If validation loss  > training loss you can call it some overfitting.
If validation loss  < training loss you can call it some underfitting.
If validation loss << training loss you can call it underfitting.

Your aim is to make the validation loss as low as possible. Some overfitting is nearly always a good thing. All that matters in the end is: is the validation loss as low as you can get it.

This often occurs when the training loss is quite a bit lower.

Also check how to prevent overfitting.

score 20 · Answer 2 · answered Jan 12 '18 at 12:16

In machine learning and deep learning there are basically three cases

1) Underfitting

This is the only case where loss > validation_loss, but only slightly, if loss is far higher than validation_loss, please post your code and data so that we can have a look at

2) Overfitting

loss << validation_loss

This means that your model is fitting very nicely the training data but not at all the validation data, in other words it's not generalizing correctly to unseen data

3) Perfect fitting

loss == validation_loss

If both values end up to be roughly the same and also if the values are converging (plot the loss over time) then chances are very high that you are doing it right

your numbering is opposite to OPs question. Also, are you sure that "loss > validation_loss" can be seen as underfitting? — pietz, Jan 12 '18 at 12:28

pietz · Answer 3 · 2018-01-12T12:30:12.573

5

1) Your model performs better on the training data than on the unknown validation data. A bit of overfitting is normal, but higher amounts need to be regulated with techniques like dropout to ensure generalization.

2) Your model performs better on the validation data. This can happen when you use augmentation on the training data, making it harder to predict in comparison to the unmodified validation samples. It can also happen when your training loss is calculated as a moving average over 1 epoch, whereas the validation loss is calculated after the learning phase of the same epoch.

edited Jan 12 '18 at 12:30

answered Jan 12 '18 at 12:22

pietz

2,093
1
21
23

for the second case where loss < validation loss, I understand the first reason of train data augmentation, but I need more clarification about the second reason of moving average. is there a way to avoid this? – Mohammed Awney Mar 22 '19 at 02:39
1

During training, frameworks like Keras will output the current training loss to the console. The loss is calculated as a moving average over all processed batches, meaning that in the early training stage when loss drops quickly the first batch of an epoch will have a much higher loss than the last. When the epoch is finished, the shown training loss will NOT represent the training loss at the end of the epoch but the average training loss from start to end of the epoch. Therefore, it's oftentimes higher than the validation loss, which is calculated at the end of the epoch entirely. – pietz Mar 22 '19 at 08:44

score 1 · Answer 4 · answered Oct 25 '21 at 12:40

Aurélien Geron made a good Twitter thread about this phenomenon. Summary:

Regularization is typically only applied during training, not validation and testing. For example, if you're using dropout, the model has fewer features available to it during training.
Training loss is measured after each batch, while the validation loss is measured after each epoch, so on average the training loss is measured ½ an epoch earlier. This means that the validation loss has the benefit of extra gradient updates.
the val set can be easier than the training set. For example, data augmentations often distort or occlude parts of the image. This can also happen if you get unlucky during sampling (val set has too many easy classes, or too many easy examples), or if your val set is too small. Or, the train set leaked into the val set.

score 0 · Answer 5 · answered Aug 04 '22 at 13:03

If your validation loss is less than your training loss, you have not correctly split the training data. This correctly indicates that the distribution of the training and validation sets is different. It should ideally be the same. MOROVER, Good Fit: In the ideal case, the training and validation losses both drop and stabilize at specified points, indicating an optimal fit, i.e. a model that does neither overfit or underfit.

score 0 · Answer 6 · answered Apr 14 '23 at 09:30

I answered your question with the help of this essay.

This can have many reasons.

1: If you use regularization, The mathematical formula of regularization cause adding weights to loss. Consequently, training loss is much higher than validation loss. Nevertheless, the gap between training and validation loss shrinks after a few iterations. Notice that lower loss does not necessarily mean higher accuracy.

2: Training loss is calculated after each batch iteration, during each epoch, but validation loss is calculated at the end of each epoch. This may make validation loss lower than training loss. However, after many iterations, validations loss exceeds training loss. In this case, either lower loss does not mean higher accuracy.

3: Another reason; having noise in a dataset is inevitable. Sometimes our training dataset comprises more outliers compared to our validation dataset. Accordingly, the model can predict validation labels easier. In this case, the model has both lower loss and higher accuracy on validation.

More extensive explanation around your question can be found here.

Training Loss and Validation Loss in Deep Learning

6 Answers6

Linked