I have a regression problem at hand and know that the error is caused by multicollinearity of input variables, but I am having a hard time to find how to identify those extra variables and remove them from the regression model to make it work.
Here is a simple example but in my case it is more than highly correlated variables (it can be x3 = x1 + x2):
import numpy as np
#input array A
A =np.array([[ 1, 2, 3],
[ 4, 5, 6],
[ 8, 10, 12]])
#output array b
b = np.array([ 22., 7., 14.])
# check the rank and input array and find it is not full rank
np.linalg.matrix_rank(A)
#will return "LinAlgError: Singular matrix"
np.linalg.solve(A,b)
# will return result without error
np.linalg.lstsq(A,b)
However, in this case, I would like to first remove X3 since it is correlated with X2 and using X1 and X2 to fit with b.
Any idea on how to reduce the rank of A to its full rank and remove extra variables before fit into the model?
Thanks!