sklearn for LASSO (BIC tuning)

Question

We encounter a problem when using the LASSO-related function in sklearn. Since the LASSO with BIC tuning just change the alpha, the results of LASSO with BIC (1) should be equivalent to the LASSO with fixed optimal alpha (2).

linear_model.LassoLarsIC
linear_model.Lasso

First, we could consider the simple DGP setting:

################## DGP ##################
np.random.seed(10)
T = 200     # sample size
p = 100     # number of regressors
X = np.random.normal(size = (T, p))
u = np.random.normal(size = T)
beta = np.hstack((np.array([5, 0, 3, 0, 1, 0, 0, 0, 0, 0]), np.zeros(p-10)))
y = np.dot(X, beta) + u

Then we use the LASSO with BIC. linear_model.LassoLarsIC

# LASSO with BIC
lasso = linear_model.LassoLarsIC(criterion='bic')
lasso.fit(X,y)
print("lasso coef = \n {}".format(lasso.coef_))
print("lasso optimal alpha = {}".format(lasso.alpha_))

lasso coef = 
 [ 4.81934044  0.          2.87574831  0.          0.90031582  0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.01705965  0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
 -0.07789506  0.          0.05817856  0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.        ]
lasso optimal alpha = 0.010764484244859006

Then we use the optimal alpha here with LASSO. linear_model.Lasso

# LASSO with fixed alpha
clf = linear_model.Lasso(alpha=lasso.alpha_)
clf.fit(X,y)
print("lasso coef = \n {}".format(clf.coef_))

lasso coef = 
 [ 4.93513468e+00  5.42491624e-02  3.00412571e+00 -3.83394653e-02
  9.87262697e-01  5.21693412e-03 -2.89977454e-02 -1.40952930e-01
  5.18653123e-02 -7.66271662e-02 -1.99074552e-02  2.72228580e-02
 -1.01217167e-01 -4.69445223e-02  1.74378470e-01  2.52655725e-02
  1.84902632e-02 -7.11030674e-02 -4.15940817e-03  1.98229236e-02
 -8.81779536e-02 -3.59094431e-02  5.53212537e-03  9.23031418e-02
  1.21577471e-01 -4.73932893e-03  5.15459727e-02  4.17136419e-02
  4.49561794e-02 -4.74874460e-03  0.00000000e+00 -3.56968194e-02
 -4.43094631e-02  0.00000000e+00  1.00390051e-03  7.17980301e-02
 -7.39058574e-02  1.73139031e-02  7.88996602e-02  1.04325618e-01
 -4.10356303e-02  5.94564069e-02  0.00000000e+00  9.28354383e-02
  0.00000000e+00  4.57453873e-02  0.00000000e+00  0.00000000e+00
 -1.94113178e-02  1.97056365e-02 -1.17381604e-01  5.13943798e-02
  2.11245596e-01  4.24124220e-02  1.16573094e-01  1.19551223e-02
 -0.00000000e+00 -0.00000000e+00 -8.35210244e-02 -8.29230887e-02
 -3.16409003e-02  8.43274240e-02 -2.90949577e-02 -0.00000000e+00
  1.24697858e-01 -3.07120380e-02 -4.34558350e-02 -0.00000000e+00
  1.30491858e-01 -2.04573808e-02  6.72141775e-02 -6.85563204e-02
  5.64781612e-02 -7.43380132e-02  1.88610065e-01 -5.53155313e-04
  0.00000000e+00  2.43191722e-02  9.10973250e-02 -4.49945551e-02
  3.36006276e-02 -0.00000000e+00 -3.85862475e-02 -9.63711465e-02
 -2.07015665e-01  8.67164869e-02  1.30776709e-01 -0.00000000e+00
  5.42630086e-02 -1.44763258e-01 -0.00000000e+00 -3.29485283e-02
 -2.35245212e-02 -6.19975427e-02 -8.83892134e-03 -1.60523703e-01
  9.63008989e-02 -1.06953313e-01  4.60206741e-02  6.02880434e-02]
-0.06321829752708413

Two coefficients are different.

Why does this happen?

Please make your code fully reproducible by 1) including all relevant imports 2) specifying a random seed for your random number generation and 3) by including your *results*. Check how to create a [mre]. Plus, please explain why exactly you expect that the results from these two *different* models should be the same. — desertnaut, Oct 05 '22 at 22:17
np.random.seed(10), does it the random seed? What do you mean by imports? — Michael cy, Oct 06 '22 at 01:58

score 0 · Accepted Answer · answered Oct 07 '22 at 12:46

So the main difference I could find off the bat is the max_iter parameter, which is at 1000 with the Lasso model and at 500 with the LassoLarsIC model.

Other hyperparameters such as tol and selection are not adjustable in the LassoLarsIC implementation.

There might be more nuanced differences in the exact implementation of the two models though.

sklearn for LASSO (BIC tuning)

1 Answers1