I am using np.linalg.lstsq
for calculating the multiple linear regression. My data set is huge: has 20,000 independent variables(X) and 1 dependent variable (Y). Each independent variable has 10,000 datas. Something like this:
X1 X2 X3.. X20,000 Y
data1 -> 10 1.8 1 1 3
data2 -> 20 2.3 200 206 5
.. .. .. .. ..
data10,000-> 300 2398 878 989 998
It is taking huge time (20-30 mins) to compute the regression coefficient
using np.linalg.lstsq
. Can anybody tell me some better solution according to computation time?