-1

I am a python newbie and seriously searching for a python implementation of LASSO without the use of python libraries (e.g sklearn etc.)

I am particularly interested in this to help me understand how the underlying maths translates to python code. To this end, I will prefer the bare python implementation of LASSO without python libraries using any sample dataset.

Thanks!!!

Ziggey
  • 13
  • 7
  • 1
    Built-in data types are not very well suited to that task. So any implementation is likely going to use at least numpy because there is just no reason to avoid it. – LukasNeugebauer Jan 29 '19 at 11:25
  • @LukasNeugebaue, yes numpy, pandas etc. is okay. They are indeed necessary to deal with the data types etc. but not anything like sklearn that already has the lasso function written. What i really would like to see is how the maths for lasso is translated into a workable python code without use of already written libraries for lasso. – Ziggey Jan 30 '19 at 12:02
  • Then I don't get the question. LASSO is somehow written in python IN sklearn. You can look at the source code in sklearn to see how it's implemented. That will be a bit of work though because LASSO inherits from ElasticNet which inherits from LinearModel. – LukasNeugebauer Jan 30 '19 at 12:29
  • @LukasNeugebauer is would not recommend that, the implementation is in Cython, and difficult to understand for a newcomer (handles case where the gram matrix is precomputed, etc). – P. Camilleri Feb 22 '19 at 04:41

1 Answers1

3

First, your question is ill-posed because there exist many algorithms to solve the Lasso. The most popular right now is coordinate descent. Here's the skeleton of the algo (without stopping criterion). I have used numba/jit because for loops can be slow in python.

import numpy as np

from numba import njit

@njit
def ST(x, u):
    "Soft thresholding of x at level u"""
    return np.sign(x) * np.maximum(np.abs(x) - u, 0.)



@njit
def cd_solver(X, y, alpha, max_iter):
    n_samples, n_features = X.shape


    beta = np.zeros(n_features)
    R = y.copy()  # residuals y - X @ beta
    lc = (X ** 2).sum(axis=0)  # lipschitz constants for coordinate descent
    for t in range(max_iter):
        for j in range(n_features):
            old = beta[j]
            beta[j] = ST(old + X[:, j].dot(R) / lc[j], alpha / lc[j])

            # keep residuals up to date
            if old != beta[j]:
                R += (old - beta[j]) * X[:, j]


        # I'll leave it up to you to implement a proper stopping criterion

    return beta



X = np.random.randn(100, 200)
y = np.random.randn(100)

if not np.isfortran(X):
    X = np.asfortranarray(X)        

alpha_max = np.max(np.abs(X.T.dot(y)))

cd_solver(X, y, alpha_max / 2., 100)

You can also try with proximal gradient/ISTA, but from my experience it's way slower than CD.

P. Camilleri
  • 12,664
  • 7
  • 41
  • 76