2

In order to properly fit a regularized linear regression model like the Elastic Net, the independent variables have to be stanardized first. However, the coefficients have then different meaning. In order to extract the proper weights of such model, do I need to calculate them manually with this equation:
b = b' * std_y/std_x
or is there already some built-in feature in sklearn?

Also: I don't think I can just use normalize=True parameter, since I have dummy variables which should probably remain unscaled

foxale
  • 127
  • 9

1 Answers1

2

You can unstandardize using the mean and standard deviation. sklearn provides them after you use StandardScaler.

from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
ss.fit_transform(X_train) # or whatever you called it

unstandardized_coefficients = model.coef_ * np.sqrt(ss.var_) + ss.mean_

That would put them on the scale of the unstandardized data.

However, since you're using regularization, it becomes a biased estimator. There is a tradeoff between performance and interpretability when it comes to biased/unbiased estimators. This is more a discussion for stats.stackexchange.com. There's a difference between an unbiased estimator and a low MSE estimator. Read about biased estimators and interpretability here: When is a biased estimator preferable to unbiased one?.

tl;dr It doesn't make sense to do what you suggested.

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143