I am confused by what normalized=
exactly do in RidgeCV from sklearn.linear_model.
The documentation says:
normalize : bool, default=False This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use :class:sklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.
- We generally refer to standardize as subtracting mean and divide by l2-norm. But the documentation is referring it as "normalize".
- If I understand the documentation correctly, I should use the third block of code (last block) by following
If you wish to standardize, please use :class:`sklearn.preprocessing.StandardScaler` before calling ``fit`` on an estimator with ``normalize=False``.
- But then, how do I interpret the coefficients? Are these standardized coefficients? But looking at their magnitude, I doubt they are standardized coefficients.
Overall, I am not sure I have followed the documentation on this normalize
parameter.
I will be testing similar code in other languages and see what I get.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
X, y = load_diabetes(return_X_y=True)
without standardize
clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
clf.coef_
print(clf.alpha_)
print(clf.score(X,y))
print(clf.coef_)
0.01
0.5166287840315846
[ -7.19945679 -234.55293001 520.58313622 320.52335582 -380.60706569 150.48375154 -78.59123221 130.31305868 592.34958662 71.1337681 ]
standardize and normalize=True
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X_std = scaler.transform(X)
clf = RidgeCV(normalize=True,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)
standardize and normalize=True
0.01
0.5166287840315843
[ -0.34244324 -11.15654516 24.76161466 15.24574131 -18.10363195
7.15778213 -3.7382037 6.19836011 28.17519659 3.38348831]
standardize and normalize=False
clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=False")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)
standardize and normalize=False
1.0
0.5175831607267165
[ -0.43127609 -11.33381407 24.77096198 15.37375716 -30.08858903
16.65328714 1.46208255 7.5211415 32.84392268 3.26632702]