Beginner Question: Effect of transforming the targets in regression model

Question

I am new to datascience, and I was working with Keras LSTM but with no success. And calculating the r2 score gets 0.0 every time.
So after some googling I found the below scikit-learn example [1] and I have some as a novice in datascience I am struggling to understand the below points:

Why did they apply the exp and log1p ?
Is there a way or hypothesis testing technique in python to know which transformation should i apply on my data in order to get better results in LSTM ?
Why did they apply it on the whole dataset and then split for train and test ? In thought the order should be saving the transformation function and use it later on the test ? (not sure how to do it in this case)

[1] https://scikit-learn.org/stable/auto_examples/compose/plot_transformed_target.html#sphx-glr-download-auto-examples-compose-plot-transformed-target-py

score 1 · Accepted Answer · answered Jan 07 '19 at 07:00

These are very broad questions but here is something that hopefully helps you along:

Why did they apply the exp and log1p ?

The documentation that you linked mentions this:

A synthetic random regression problem is generated. The targets y are modified by: (i) translating all targets such that all entries are non-negative and (ii) applying an exponential function to obtain non-linear targets which cannot be fitted using a simple linear model.

So they're doing the exp to create a non-linear target. The log1p is fit so that it can come close to approximating a gaussian (normal distribution) because most models make the normalcy assumption.

Is there a way or hypothesis testing technique in python to know which transformation should i apply on my data in order to get better results in LSTM ?

There is no one-size-fits-all but generally you try different transformations (log, exp, sqrt, cubert, inverse etc.) to try to get your features to approximate normal distributions. Different models make different distribution assumptions about the predictors and many assume a gaussian (although some are robust to that assumption being violated). So you do feature transforms to try to get them to be as close to normal as you can - it can't hurt to have normally distributed features.

Feature scaling on the other hand is done for reasons around model performance and convergence where your model might not find the optimal solution if the domains of your features are vastly different.

Why did they apply it on the whole dataset and then split for train and test ? In thought the order should be saving the transformation function and use it later on the test ? (not sure how to do it in this case)

You might be confused between Feature Transformation and Feature Scaling. Applying the transform together or later won't make any difference - for e.g. it makes no difference whether you split first and do the log transform later. They do it for convenience, debugging and readability of code.

However, Feature Scaling is a different issue altogether. If you deploy your models to production, you'll likely need to retain the scaling parameters \ functions and apply them separately to the train \ test and production data.

Beginner Question: Effect of transforming the targets in regression model

1 Answers1