LinAlgError: Singular matrix in using statsmodel

Asked Nov 07 '17 at 00:51

Active Nov 07 '17 at 00:51

Viewed 1,488 times

I have a regression problem at hand and know that the error is caused by multicollinearity of input variables, but I am having a hard time to find how to identify those extra variables and remove them from the regression model to make it work.

Here is a simple example but in my case it is more than highly correlated variables (it can be x3 = x1 + x2):

import numpy as np

#input array A
A =np.array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 8, 10, 12]])

#output array b
b = np.array([ 22.,   7.,  14.])

# check the rank and input array and find it is not full rank
np.linalg.matrix_rank(A)

#will return "LinAlgError: Singular matrix"
np.linalg.solve(A,b)

# will return result without error
np.linalg.lstsq(A,b)

However, in this case, I would like to first remove X3 since it is correlated with X2 and using X1 and X2 to fit with b.

Any idea on how to reduce the rank of A to its full rank and remove extra variables before fit into the model?

Thanks!

asked Nov 07 '17 at 00:51

Kexin Xu

the correlation matrix could give you a good hint. – grovina Nov 07 '17 at 00:57
see stackoverflow.com/a/13313828/333700 for how to use QR to find all collinear or linearly independent columns – Josef Nov 07 '17 at 02:28

LinAlgError: Singular matrix in using statsmodel

0 Answers0