Scaling feature vectors for machine learning when distribution of features is different

Question

I am trying to scale my feature vector for an algorithm. I have 3 features and 1 target variable. Feature 1: Has a Gaussian-like distribution Feature 2: Skewed, y-value decreases as x increases. Feature 3: Highly skewed. Almost all values are the same. Target Variable: Highly skewed.

My question is: I want to do a MinMaxScaling for the feature and target variables. Is it okay to scale all the features or the skewed ones.

Feature1:

Feature2:

Feature3:

Target Variable looks like feature 3. Since my data for Feature3 and target variable is mostly sparse, is there any alternative to MinMaxScaling?

Also is it okay to use different scalers on training data depending on the distribution of the respective columns ? I am happy to provide more info if the question is not clear :)

score 0 · Answer 1 · answered Jul 27 '18 at 20:59

0

As it is now, you should not directly apply a MinMaxScaling because features 2&3 will be very close to 0 everytime and you won't learn from it. What I'm used to do, it's to apply a log on those kind of features before to apply the MinMaxScaling/StandardScaling. This should be fine for Feature 2 but for feature 3, maybe you should consider doing a log(log(x)) but I never tried it. You will probably lose too much variance. I would be curious to see the histogram of Feature 3 with one or 2 nested logs.

I hope it helps, Nicolas

answered Jul 27 '18 at 20:59

Nicolas M.

1,472
1
13
26

Hey Nicolas, I will try the log of log. In the case that normalizes the distribution, what scaling do I apply on the other columns, should that also be log or other. – Anuvrat Tiku Jul 29 '18 at 02:59
Is it okay to apply different scaling on different columns in the feature vector ? – Anuvrat Tiku Jul 29 '18 at 03:00
If you use log, you will see that your values will be kind of "borned". As a result, a MinMaxScaling is fine. For your second question, yes you can apply a log only on few features if needed. This is part of data Preparation, you will just have to include it in your pipeline if it goes to prod then. Regarding using different scaling, TBH, I don't know exactly, I used to do that based on the feature distribution but I don't know if it's good or not... – Nicolas M. Jul 29 '18 at 12:04

Scaling feature vectors for machine learning when distribution of features is different

1 Answers1