1

I have a huge sparse matrix A

<5000x5000 sparse matrix of type '<type 'numpy.float64'>'
    with 14979 stored elements in Compressed Sparse Column format>

for whom I need to delete linearly dependent rows. I have a prior that j rows will be dependent. I need to

  • find out which sets of rows are linearly dependent
  • for each set, keep one arbitrary row and remove the others

I was trying to follow this question, but the corresponding method for sparse matrices, scipy.sparse.linalg.eigs says that

k: The number of eigenvalues and eigenvectors desired. k must be smaller than N. It is not possible to compute all eigenvectors of a matrix.

How should I proceed?

Community
  • 1
  • 1
FooBar
  • 15,724
  • 19
  • 82
  • 171
  • The probably correct tool here is [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition). Scipy has it for dense matrices only. The [Gram-Schmid orthonormalization](https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process#Algorithm) however should be relatively straightforward to program for sparse matrices, although making it fast will probably take more effort. Linearly dependent rows are indicated by the rows becoming zero (or close to zero) during the orthonormalization. This you can detect and record the index of the row --- these are the ones you want to delete. – pv. Mar 08 '15 at 21:51

1 Answers1

1

scipy.sparse.linalg.eigs uses implicitly restarted Arnoldi iteration. The algorithm is meant for finding a few eigenvectors quickly, and can't find all of them.

5000x5000, however, is not that large. Have you considered just using numpy.linalg.eig or scipy.linalg.eig? It will probably take a few minutes, but it isn't completely infeasible. You don't gain anything by using a sparse matrix, but I'm not sure there's an algorithm for efficiently finding all eigenvectors of a sparse matrix.

cge
  • 9,552
  • 3
  • 32
  • 51
  • I suppose there is alternative procedure that does not involve computing the eigenvalues? This is part of a code that has to be iterated over around 50-100 times. Ah, and I'm starting (for some other reason) with a sparse matrix, so I wasn't creating it specifically for this purpose. – FooBar Mar 08 '15 at 17:00