What is the problem with skewed data? Why should we make the distribution more like a Gaussian?
Asked
Active
Viewed 671 times
1 Answers
2
"What is the problem with skewed data?". There is not a problem at all. The question may rather be why skewed data may cause problems in some machine learning models. It comes solely down to how the model utilizes the data for approximating the learning function.
It's not necessarily a problem that data is skewed, and it depends on what type of machine learning model you use.
In many regression algorithms (e.g. linear regression) normality of residuals is assumed. That is, the error between the predictions and actual responses are normally (Gaussian) distributed. Having skewed data may manifest itself in the learning function and make the model produce a skewed distribution of residuals.

Bjarke Kingo
- 400
- 7
- 14