We encounter a problem when using the LASSO-related function in sklearn. Since the LASSO with BIC tuning just change the alpha, the results of LASSO with BIC (1) should be equivalent to the LASSO with fixed optimal alpha (2).
- linear_model.LassoLarsIC
- linear_model.Lasso
First, we could consider the simple DGP setting:
################## DGP ##################
np.random.seed(10)
T = 200 # sample size
p = 100 # number of regressors
X = np.random.normal(size = (T, p))
u = np.random.normal(size = T)
beta = np.hstack((np.array([5, 0, 3, 0, 1, 0, 0, 0, 0, 0]), np.zeros(p-10)))
y = np.dot(X, beta) + u
Then we use the LASSO with BIC. linear_model.LassoLarsIC
# LASSO with BIC
lasso = linear_model.LassoLarsIC(criterion='bic')
lasso.fit(X,y)
print("lasso coef = \n {}".format(lasso.coef_))
print("lasso optimal alpha = {}".format(lasso.alpha_))
lasso coef =
[ 4.81934044 0. 2.87574831 0. 0.90031582 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.01705965 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
-0.07789506 0. 0.05817856 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
lasso optimal alpha = 0.010764484244859006
Then we use the optimal alpha here with LASSO. linear_model.Lasso
# LASSO with fixed alpha
clf = linear_model.Lasso(alpha=lasso.alpha_)
clf.fit(X,y)
print("lasso coef = \n {}".format(clf.coef_))
lasso coef =
[ 4.93513468e+00 5.42491624e-02 3.00412571e+00 -3.83394653e-02
9.87262697e-01 5.21693412e-03 -2.89977454e-02 -1.40952930e-01
5.18653123e-02 -7.66271662e-02 -1.99074552e-02 2.72228580e-02
-1.01217167e-01 -4.69445223e-02 1.74378470e-01 2.52655725e-02
1.84902632e-02 -7.11030674e-02 -4.15940817e-03 1.98229236e-02
-8.81779536e-02 -3.59094431e-02 5.53212537e-03 9.23031418e-02
1.21577471e-01 -4.73932893e-03 5.15459727e-02 4.17136419e-02
4.49561794e-02 -4.74874460e-03 0.00000000e+00 -3.56968194e-02
-4.43094631e-02 0.00000000e+00 1.00390051e-03 7.17980301e-02
-7.39058574e-02 1.73139031e-02 7.88996602e-02 1.04325618e-01
-4.10356303e-02 5.94564069e-02 0.00000000e+00 9.28354383e-02
0.00000000e+00 4.57453873e-02 0.00000000e+00 0.00000000e+00
-1.94113178e-02 1.97056365e-02 -1.17381604e-01 5.13943798e-02
2.11245596e-01 4.24124220e-02 1.16573094e-01 1.19551223e-02
-0.00000000e+00 -0.00000000e+00 -8.35210244e-02 -8.29230887e-02
-3.16409003e-02 8.43274240e-02 -2.90949577e-02 -0.00000000e+00
1.24697858e-01 -3.07120380e-02 -4.34558350e-02 -0.00000000e+00
1.30491858e-01 -2.04573808e-02 6.72141775e-02 -6.85563204e-02
5.64781612e-02 -7.43380132e-02 1.88610065e-01 -5.53155313e-04
0.00000000e+00 2.43191722e-02 9.10973250e-02 -4.49945551e-02
3.36006276e-02 -0.00000000e+00 -3.85862475e-02 -9.63711465e-02
-2.07015665e-01 8.67164869e-02 1.30776709e-01 -0.00000000e+00
5.42630086e-02 -1.44763258e-01 -0.00000000e+00 -3.29485283e-02
-2.35245212e-02 -6.19975427e-02 -8.83892134e-03 -1.60523703e-01
9.63008989e-02 -1.06953313e-01 4.60206741e-02 6.02880434e-02]
-0.06321829752708413
Two coefficients are different.
Why does this happen?