8

I'm struggling with the following problem: I have some very big matrices (say, at least, 2000x2000, and probably in the future they will even reach 10000x10000) with very small rank (2 or 3, call it N) and I need to find an efficient Python routine to extract the linear independent rows (or columns, the matrix is symmetric!) From them. I tried to take the first N columns of the Q matrix of QR decomposition but it seems not to work correctly (is this wrong maybe?).

Here's the Python code I use to implement the method suggested by Ami Tavory:

from numpy import absolute
from numpy.linalg import qr

q = qr(R)[1] #R is my matrix
q = absolute(q)
sums = sum(q,axis=1)

i = 0
while( i < dim ): #dim is the matrix dimension
    if(sums[i] > 1.e-10):
       print "%d is a good index!" % i
    i += 1

This should tell me if the row is non-zero and therefore if the I-th column of R is linearly independent.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Simone Bolognini
  • 505
  • 5
  • 14

3 Answers3

10

The Gram Schmidt process finds a basis (equivalently largest independent subset) using linear combinations, and the QR Decomposition effectively mimics this.

Therefore, one way to do what you want is to apply numpy.linalg.qr to the transpose, and check the non-zero components of the R matrix. The corresponding columns (in the transpose matrix, i.e., the rows in your original matrix) are independent.


Edit After some searching, I believe this Berkeley lecture explains it, but here are examples

import numpy as np

# 2nd column is redundant
a = np.array([[1, 0, 0], [0, 0, 0], [1, 0, 1]])
>> np.linalg.qr(a)[1] # 2nd row empty
array([[ 1.41421356,  0.        ,  0.70710678],
   [ 0.        ,  0.        ,  0.        ],
   [ 0.        ,  0.        ,  0.70710678]])

# 3rd column is redundant
a = np.array([[1, 0, 0], [1, 0, 1], [0, 0, 0], ])
>> np.linalg.qr(a)[1] # 3rd row empty
array([[ 1.41421356,  0.        ,  0.70710678],
   [ 0.        ,  0.        , -0.70710678],
   [ 0.        ,  0.        ,  0.        ]])

# No column redundant
a = np.array([[1, 0, 0], [1, 0, 1], [2, 3, 4], ])
>> np.linalg.qr(a)[1] # No row empty
array([[ 2.44948974,  2.44948974,  3.67423461],
   [ 0.        ,  1.73205081,  1.73205081],
   [ 0.        ,  0.        ,  0.70710678]])
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thank you! By "the corresponding components" you mean the columns/rows of the starting matrix or the ones of the R matrix? Sorry I never used this method before... – Simone Bolognini Jun 27 '15 at 20:48
  • 1
    Hold on, I'll try to find a link for you, if you're unfamiliar with this method. – Ami Tavory Jun 27 '15 at 20:49
  • 1
    Very kind! Because it's still not really clear to me what I should look for in R and how to transpose the result in the starting matrix. – Simone Bolognini Jun 27 '15 at 20:51
  • 1
    @SimoneBolognini See if the examples help. – Ami Tavory Jun 27 '15 at 21:10
  • If I get correctly, your method suggests to look for the non null rows of the R matrix, whose index should correspond to the indices of the linearly independent columns of my starting matrix. I tried it, and it seems not to work: I know that my matrix has rank = 2 but still I find 6 non null rows, and just one of them corresponds to one of the two real independent vectors. The other one (which I know is found in the bottom of the matrix) is never considered. Any clue? – Simone Bolognini Jun 28 '15 at 07:02
  • @SimoneBolognini Can you post some minimal example showing the problem? You don't necessarily have to do it here if it's too big; some pastebin or gist will do as well. – Ami Tavory Jun 28 '15 at 07:04
  • My matrix is 1656x1656... if you want I can load the .txt file on dropbox and post the link here for the download and edit the post with the python snippet with the code I use.. is it ok? – Simone Bolognini Jun 28 '15 at 07:13
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/81765/discussion-between-ami-tavory-and-simone-bolognini). – Ami Tavory Jun 28 '15 at 07:14
  • Why are your examples of redundant columns just zeroes? If you change them to non-zero redundant values the method appears to stop working. I tested with `[[1,0,0],[1,0,0],[1,1,1]]`. When you look at the R matrix there are no columns or rows near zero. Also it appears the link to Berkeley is broken. – AnnanFay Jun 06 '17 at 23:30
  • Fyi, the link to the lecture is dead. – Baum mit Augen May 24 '18 at 12:16
  • Hi, In your second example `a = np.array([[1, 0, 0], [1, 0, 1], [0, 0, 0], ])`, the 2nd column is redundant, not the 3rd one! This is because `np.array` interprets vectors as *columns* but not rows. If you execute `print(a)` for all your three examples, you can see the vectors you gave to it vertically. Plus, if you want to discover independent columns using QR, you should look at *pivot columns* in R; zero rows show free rows. – Shahrokh Bah Jan 11 '21 at 16:53
  • More info: [math.stackexchange](https://math.stackexchange.com/questions/748500/how-to-find-linearly-independent-columns-in-a-matrix/748538#748538) and [Lecture 17: Orthogonal matrices and Gram-Schmidt](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-17-orthogonal-matrices-and-gram-schmidt) – Shahrokh Bah Jan 11 '21 at 16:53
0

Here's the link where I found a solution to my problem! The Cauchy-Schwartz method proposed here works just fine!!

Community
  • 1
  • 1
Simone Bolognini
  • 505
  • 5
  • 14
0

Here is a minimal example demonstrating Ami's suggestion of using the QR decomp and taking non-null rows of R doesn't work as described:

# 2 dependent rows
>> A = array([[ 1.,  1.,  0.,  0.],
              [-1., -1.,  0.,  0.]])
# looks like it works for this case
>> numpy.linalg.qr(A.T)[1]
array([[-1.41421356,  1.41421356], 
       [ 0.        ,  0.        ]])

# counter example
>> A2 = numpy.array([[1,1,0,0.],[-1,-1,0,0.],[1,2,3,4]])
>> numpy.linalg.qr(A2.T)[1] 
array([[-1.41421356,  1.41421356, -2.12132034],
       [ 0.        ,  0.        ,  0.70710678], # 2nd row is not null
       [ 0.        ,  0.        , -5.        ]]) 

The second pivot is missing, it's possible there is some structure here that could be used but not the rows.