Do I have to do normalization on my data if all the features are of the same scale? for example, all the columns are features and each row/sample is the number of occurrences for each feature? And if normalization is required do I need feature-wise or sample-wise normalization?
Asked
Active
Viewed 430 times
0
-
Hello and welcome to SO! Please read the [tour](https://stackoverflow.com/tour), and [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask). Please also read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Tomer Shetah Dec 16 '20 at 11:39
1 Answers
1
No, you do not have to do normalization on your data if all your features are on the same scale.
For standardization, you want to check the statistical distribution of your data to make sure they have a standard normal distribution with mean,μ=0 and standard deviation, σ=1; where μ is the mean (average) and σ is the standard deviation from the mean.
You can do this in pandas by calling .describe()
on your data and investigating the mean
and std
. If it happens that some features have normal distribution while others don't, you can carry-our sample-wise standardization (on the entire dataset).

MLDev
- 326
- 4
- 8
-
Can you please elaborate more on the standardization part? How do I interpret the results from the mean and std? – Martina Morcos Dec 17 '20 at 12:06
-
Standardisation at a high level means the distribution of your sample (=data) are a "standard" normal distribution. A normal distribution is one where the mean of the sample is 0, and the standard deviation or variance is 1 (or at least very close to one). To make it easier for your model to learn the underlying structure of your data, you standardize it. To make training your model more efficient, you normalize it. – MLDev Dec 19 '20 at 05:02