I have a huge dataset as input for a multiple lasso fit. The predictor values have size of 1250 by 1milion and the target value is 1250 by 1250
If I fit a normal regression by sklearn there there is an option to use multiple threads which in this case the whole process runs in a short time with an acceptable result.
sklearn.linear_model.LinearRegression(*, fit_intercept=True, normalize='deprecated', copy_X=True, n_jobs=None, positive=False)
In the upper line if I set n_jobs=-1
it will use all the cores available so that computational cost will drop dramatically.
But, there is no such an option for lasso regression in sklearn:
sklearn.linear_model.Lasso(alpha=1.0, *, fit_intercept=True, normalize='deprecated', precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')
Obviously, it is really computationally expensive if I run this fitting on a single core. There are options in scikit-learn which one can run cross-validation for lasso one different cpu. But my problem is that I'm not going to do hyper-parameter optimization. The single problem it self is computationally expensive.
Questions:
- Is there any way to do a distributed multiple lasso regression?(not for hyper-parameter optimization)
- If there isn't any way for parallel lasso regression, what is the root of this limitation? What is the difference between minimization of lost function for regression and lasso regression?