What does sklearn.linear_model RidgeCV normalize= parameter exactly do

Question

I am confused by what normalized= exactly do in RidgeCV from sklearn.linear_model.

The documentation says:

normalize : bool, default=False This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use :class:sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

We generally refer to standardize as subtracting mean and divide by l2-norm. But the documentation is referring it as "normalize".
If I understand the documentation correctly, I should use the third block of code (last block) by following

If you wish to standardize, please use
:class:`sklearn.preprocessing.StandardScaler` before calling ``fit``
on an estimator with ``normalize=False``.

But then, how do I interpret the coefficients? Are these standardized coefficients? But looking at their magnitude, I doubt they are standardized coefficients.

Overall, I am not sure I have followed the documentation on this normalize parameter.

I will be testing similar code in other languages and see what I get.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
X, y = load_diabetes(return_X_y=True)

without standardize

clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
clf.coef_
print(clf.alpha_)
print(clf.score(X,y))
print(clf.coef_)
0.01 
0.5166287840315846 
[ -7.19945679 -234.55293001 520.58313622 320.52335582 -380.60706569 150.48375154 -78.59123221 130.31305868 592.34958662 71.1337681 ]

standardize and normalize=True

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X_std = scaler.transform(X)
clf = RidgeCV(normalize=True,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=True
0.01
0.5166287840315843
[ -0.34244324 -11.15654516  24.76161466  15.24574131 -18.10363195
   7.15778213  -3.7382037    6.19836011  28.17519659   3.38348831]

standardize and normalize=False

clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=False")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=False
1.0
0.5175831607267165
[ -0.43127609 -11.33381407  24.77096198  15.37375716 -30.08858903
  16.65328714   1.46208255   7.5211415   32.84392268   3.26632702]

Not an answer, but [this TDS article](https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02) gives a good breakdown of scale/standardize/normalize. One big takeaway is that apparently sklearn normalizer is applied on each sample, not on each feature. (Row-wise, not column-wise). It's not clear from the docs on ridgeCV whether this is the case when `normalize=True` — G. Anderson, Feb 13 '20 at 21:53

jawsem · Answer 1 · 2020-02-13T22:28:09.930

Edit:

There is also one thing to note about the diabetes dataset being used in this example.

The data is already normalized so running normalize on it by itself may not get the exact affect you are looking for.

It might be better to use a different dataset for your tests.

The normalize parameter works the same way sklearn.preprocessing.normalizer which is different than the Standard Scaler.

The main difference is the normalizer will act on rows (observations) while the standard scaler will act on columns.

Here is a another post related post. Difference between standardscaler and Normalizer in sklearn.preprocessing.

This post also links some additional articles that you can explore.

Edit:

The documentation is confusing and it appears after reviewing the source code it looks like it may actually be acting on the columns rather than the rows since the axis = 0 parameter is being supplied.

One way we can test this is to use the normalize function and compare how it performs relative to passing the parameter.

Here is the code that does the preprocessing. (f_normalize is the same function that is linked).

            if normalize:
                X, X_scale = f_normalize(X, axis=0, copy=False,
                                         return_norm=True)

I think you can try this and see if you get the same result as just using the normalize parameter.

from sklearn.preprocessing import normalize

X_std= normalize(X,axis=0,return_norm=False)
clf = RidgeCV(normalize=False,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X_std, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=True
0.01
0.5166287840315835
[  -7.19945679 -234.55293001  520.58313622  320.52335582 -380.60706569
  150.48375154  -78.59123221  130.31305868  592.34958662   71.1337681 ]

This gets the same result as:

X, y = load_diabetes(return_X_y=True)

clf = RidgeCV(normalize=True,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
print("standardize and normalize=True")
print(clf.alpha_)
print(clf.score(X_std,y))
print(clf.coef_)

standardize and normalize=True
0.01
0.5166287840315835
[  -7.19945679 -234.55293001  520.58313622  320.52335582 -380.60706569
  150.48375154  -78.59123221  130.31305868  592.34958662   71.1337681 ]

Thanks for the quick response. But it is confusing in the documentation that "If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm." The description sounds like standardize to me. Maybe this is an error in documentation? — Sarah, Feb 13 '20 at 22:04
Hi Sarah, I agree the documentation is confusing. I reviewed some of the source code and added some additional detail that you can look into further. I think you should get the same result using the preprocessing.normalize function as the normalize parameter. My links should still be somewhat useful in explaining the difference. — jawsem, Feb 13 '20 at 22:17
jawsem, thanks for looking into this. But I don't think ```from sklearn.preprocessing import normalize``` should be used. ```from sklearn.preprocessing import normalize``` is to make vector norm length 1. This is not what ridge regression needs. — Sarah, Feb 14 '20 at 20:48

Sarah · Answer 2 · 2020-02-14T21:24:45.137

One has to interprete that normalize parameter to be standardize in sklearn.linear_model Ridge or RidgeCV.

This is a problem of the documentation in sklearn, which can cause confusions. That parameter normalize should be corrected to standardize.

Anyway, I have verified the following with SAS PROC REG using the boston housing dataset, which is not standardized.

from sklearn.datasets import load_boston
dataset =load_boston()
X =dataset.data
y=dataset.target

clf = RidgeCV(normalize=True,alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
clf.coef_
print(clf.alpha_)
print(clf.score(X,y))
print(clf.coef_)
coef =pd.DataFrame(zip(dataset.feature_names,clf.coef_)) #match SAS closely

0.01  (alpha)
0.7403788769476067 (R square)
0      CRIM  -0.103542
1        ZN   0.043406
2     INDUS   0.005200
3      CHAS   2.746307
4       NOX -16.625596
5        RM   3.865188
6       AGE  -0.000341
7       DIS  -1.413550
8       RAD   0.269159
9       TAX  -0.010577
10  PTRATIO  -0.934596
11        B   0.009288
12    LSTAT  -0.515911

There is very negligible difference in the coefficients (the 6th decimal place), which could be due to rounding.

What does sklearn.linear_model RidgeCV normalize= parameter exactly do

without standardize

standardize and normalize=True

standardize and normalize=False

2 Answers2