What are overfitting and noise in machine learning?

Question

Can anyone explain what is overfitting and noise in ML ? Can this happen in parametric classfiers ? How to know if the model is overfitting the data ?

Sometimes I wonder how you all find stackoverflow before Google. Overfitting is very basic concept in ML and simple Google search would have brought plethora of results. That being said, check out the answer. — techtabu, Jul 12 '17 at 16:35
SO is for programming Q&A, to learn more of ML take this [course](https://www.coursera.org/learn/machine-learning), it's a good intro. — marbel, Jul 13 '17 at 23:38

techtabu · Answer 1 · 2017-07-13T19:42:55.097

In ML, overfitting means models perform well on the training data but don’t generalize well for new data. This happens when the model is too complex relative to the amount and noisiness of the training data. So, how do you know you overfit your data? After you build your model, you test it against your training set, and you get glorious results. But, when you test against your test set or real life, accuracy of your prediction will be very low. So, it's time to take corrective measure. You can,

simplify the model by reducing the number of attributes in the training data
Gather more training data
reduce noise in the training data.

Yes, overfitting can happen any parametric model.

Overfitting can happen in any model, no matter it's parametric or not. — marbel, Jul 13 '17 at 23:37

score 1 · Accepted Answer · answered Jul 16 '20 at 19:58

Over fitting is a condition in which your model with a predictive ability fits into the training data too much. Such a model will produce dramatically vague results when a new testing data is introduced. Here, the training error will be very low since the model has tuned and adjusted itself to the training data in a very adaptable position. The situation of becoming itself into low training error phenomenon is called as low bias. Similarly, when a test data is introduced, the error metrics in test data will be very high due to the above mentioned conditions. Such a model is called as high variance model.

Conversely, under fitting is a condition in which your model is fitted very poorly in the training data itself making itself to be called as high bias. Such a model cannot be expected to give a good accuracy in test data also. This model also can be called as a high variance model due to large error when fitting into test data.

Usually we expect a good model always to be a low bias/low variance model.

There are many ways to reduce overfitting but many of them are specific to the character of model like-

· Elastic net method (regression)

· Lasso method (regression)

· Ridge method (regression)

· RELU activation function (neural networks)

· Reducing number of hidden layers (neural network)

· Pruning (Decision tree regression and classification) etc.

There is no defined methods for reducing under fitting but theoretically, if you do the process feature selection very carefully, then under fitting can be removed because a model will always try itself to become over fit and not under fit.

If the data is too much vague and if proper EDA is not done, it will also lead for under fitting. Therefore, it is always recommended to have a proper eda before any machine learning process.

Yes, it can also occur for parametric classifiers.

You can detect the over-fitting by some evaluation metrics-

If it is regression then the metrics like R Squared, Adj R Squared, RMSE, MAE etc will be too high.
If it is classification then the metrics like Accuracy, Precision etc will be too high.

What are overfitting and noise in machine learning?

2 Answers2