2

We encounter a problem when using the LASSO-related function in sklearn. Since the LASSO with BIC tuning just change the alpha, the results of LASSO with BIC (1) should be equivalent to the LASSO with fixed optimal alpha (2).

  1. linear_model.LassoLarsIC
  2. linear_model.Lasso

First, we could consider the simple DGP setting:

################## DGP ##################
np.random.seed(10)
T = 200     # sample size
p = 100     # number of regressors
X = np.random.normal(size = (T, p))
u = np.random.normal(size = T)
beta = np.hstack((np.array([5, 0, 3, 0, 1, 0, 0, 0, 0, 0]), np.zeros(p-10)))
y = np.dot(X, beta) + u

Then we use the LASSO with BIC. linear_model.LassoLarsIC

# LASSO with BIC
lasso = linear_model.LassoLarsIC(criterion='bic')
lasso.fit(X,y)
print("lasso coef = \n {}".format(lasso.coef_))
print("lasso optimal alpha = {}".format(lasso.alpha_))
lasso coef = 
 [ 4.81934044  0.          2.87574831  0.          0.90031582  0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.01705965  0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
 -0.07789506  0.          0.05817856  0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.        ]
lasso optimal alpha = 0.010764484244859006

Then we use the optimal alpha here with LASSO. linear_model.Lasso

# LASSO with fixed alpha
clf = linear_model.Lasso(alpha=lasso.alpha_)
clf.fit(X,y)
print("lasso coef = \n {}".format(clf.coef_))
lasso coef = 
 [ 4.93513468e+00  5.42491624e-02  3.00412571e+00 -3.83394653e-02
  9.87262697e-01  5.21693412e-03 -2.89977454e-02 -1.40952930e-01
  5.18653123e-02 -7.66271662e-02 -1.99074552e-02  2.72228580e-02
 -1.01217167e-01 -4.69445223e-02  1.74378470e-01  2.52655725e-02
  1.84902632e-02 -7.11030674e-02 -4.15940817e-03  1.98229236e-02
 -8.81779536e-02 -3.59094431e-02  5.53212537e-03  9.23031418e-02
  1.21577471e-01 -4.73932893e-03  5.15459727e-02  4.17136419e-02
  4.49561794e-02 -4.74874460e-03  0.00000000e+00 -3.56968194e-02
 -4.43094631e-02  0.00000000e+00  1.00390051e-03  7.17980301e-02
 -7.39058574e-02  1.73139031e-02  7.88996602e-02  1.04325618e-01
 -4.10356303e-02  5.94564069e-02  0.00000000e+00  9.28354383e-02
  0.00000000e+00  4.57453873e-02  0.00000000e+00  0.00000000e+00
 -1.94113178e-02  1.97056365e-02 -1.17381604e-01  5.13943798e-02
  2.11245596e-01  4.24124220e-02  1.16573094e-01  1.19551223e-02
 -0.00000000e+00 -0.00000000e+00 -8.35210244e-02 -8.29230887e-02
 -3.16409003e-02  8.43274240e-02 -2.90949577e-02 -0.00000000e+00
  1.24697858e-01 -3.07120380e-02 -4.34558350e-02 -0.00000000e+00
  1.30491858e-01 -2.04573808e-02  6.72141775e-02 -6.85563204e-02
  5.64781612e-02 -7.43380132e-02  1.88610065e-01 -5.53155313e-04
  0.00000000e+00  2.43191722e-02  9.10973250e-02 -4.49945551e-02
  3.36006276e-02 -0.00000000e+00 -3.85862475e-02 -9.63711465e-02
 -2.07015665e-01  8.67164869e-02  1.30776709e-01 -0.00000000e+00
  5.42630086e-02 -1.44763258e-01 -0.00000000e+00 -3.29485283e-02
 -2.35245212e-02 -6.19975427e-02 -8.83892134e-03 -1.60523703e-01
  9.63008989e-02 -1.06953313e-01  4.60206741e-02  6.02880434e-02]
-0.06321829752708413

Two coefficients are different.

Why does this happen?

Michael cy
  • 21
  • 4
  • Please make your code fully reproducible by 1) including all relevant imports 2) specifying a random seed for your random number generation and 3) by including your *results*. Check how to create a [mre]. Plus, please explain why exactly you expect that the results from these two *different* models should be the same. – desertnaut Oct 05 '22 at 22:17
  • No, you are still missing the imports and the random seed. – desertnaut Oct 06 '22 at 01:43
  • np.random.seed(10), does it the random seed? What do you mean by imports? – Michael cy Oct 06 '22 at 01:58

1 Answers1

0

So the main difference I could find off the bat is the max_iter parameter, which is at 1000 with the Lasso model and at 500 with the LassoLarsIC model.

Other hyperparameters such as tol and selection are not adjustable in the LassoLarsIC implementation.

There might be more nuanced differences in the exact implementation of the two models though.

Michael cy
  • 21
  • 4