Standardization X_train and Y_train

Question

Im a beginner in this field and currently working on a dataset based on Facebook Ads.

The target variable is Amount spent range between 10 to 200 and the features are Frequency (range between 0.1 to 3.0) and Impressions (1000 to 30000)

After training my model (Linear Regression) my score was 0.84 but the MSE was 490, this value i think is because features have rows that have cases like this: (Frequency: 1.432 and Impressions: 25412)

I applied Standardization after split my data in train and test data

Without Standardization

**To solve this i thought applying Standardization to remove the high variance of the values i applied fit_transform (X_train) and transform (X_test)

But the score was the same and the MSE too.

After Standardization

My doubts are quite idiot but i working on this field like really new on ML

1) I notice the community dont apply Standardization on target variable (Y) why they do it?

2) Am i doing anything wrong here?

Thank you guys!

score 0 · Answer 1 · answered Oct 26 '20 at 16:38

The standardization is normally applied to the independent variable so that means is about zero and stand std is one. I.e the data is converted to standard normal distribution. The reason for this is that it normalizes the scale of independent variable form withing zero to 1. For example, if variable1 is in the scale os 100 to 100 and variable2 is in the range of 0 to 1 and if plot them together along the x-axis, you would see vaibale2 in the graph as close to zero, any change any variable 2 may not affect the target value.

The target should not be standardized because that what if the model should predict. y=f(x) , y is the target value.

Standardization X_train and Y_train

1 Answers1