Fastest way to compute k largest eigenvalues and corresponding eigenvectors with numpy

Question

I have a large NxN dense symmetric matrix and want the eigenvectors corresponding to the k largest eigenvalues. What's the best way to find them (preferably using numpy but perhaps in general using blas/atlas/lapack if that's the only way to go)? In general N is much much larger then k (say N > 5000, k < 10).

Numpy seems to only have functions for finding the k largest eigenvalues if my starting matrix is sparse.

score 22 · Accepted Answer · answered Aug 28 '12 at 22:51

22

In SciPy, you can use the linalg.eigh function, with the eigvals parameter.

eigvals : tuple (lo, hi) Indexes of the smallest and largest (in ascending order) eigenvalues and corresponding eigenvectors to be returned: 0 <= lo < hi <= M-1. If omitted, all eigenvalues and eigenvectors are returned.

Which in your case should be set to (N-k,N-1).

answered Aug 28 '12 at 22:51

Andy Hayden

359,921
101
625
535

3

The non-sparse method was fastest method for me. Using Giuliano's benchmark script with k=2 I get eigh elapsed time: 93.704689 eigsh elapsed time: 353.433379 eig elapsed time: 870.060089 The last time is for numpy.linalg.eig. This is on my macbook pro. – Anthony Bak Aug 30 '12 at 17:06

score 7 · Answer 2 · answered Aug 28 '12 at 23:19

7

Actually the sparse routines work for dense numpy arrays also, I think they use some kind of Krylov subspace iteration, therefore they need to compute several matrix-vector products, which means that if your k << N, the sparse routinescould be (marginally?) faster.

Check out the docs http://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html

and the following code (go to take a good coffee with friends until it ends)

import numpy as np
from time import clock
from scipy.linalg import eigh as largest_eigh
from scipy.sparse.linalg.eigen.arpack import eigsh as largest_eigsh

np.set_printoptions(suppress=True)
np.random.seed(0)
N=5000
k=10
X = np.random.random((N,N)) - 0.5
X = np.dot(X, X.T) #create a symmetric matrix

# Benchmark the dense routine
start = clock()
evals_large, evecs_large = largest_eigh(X, eigvals=(N-k,N-1))
elapsed = (clock() - start)
print "eigh elapsed time: ", elapsed

# Benchmark the sparse routine
start = clock()
evals_large_sparse, evecs_large_sparse = largest_eigsh(X, k, which='LM')
elapsed = (clock() - start)
print "eigsh elapsed time: ", elapsed

answered Aug 28 '12 at 23:19

Giuliano

640
3
15

With several examples I've tried of "small" `k`, I get 44seconds vs 18seconds (`eigsh` being the faster), when `k=2` they are approximately the same, when `k=1` (strangely) or `k` is "large" `eigsh` is considerably slower, in all cases `eigh` takes around 44seconds. There must be more efficient algorithm to do this, which you would expect could find the largest eigenvalue-pair in orders of magnitude less time than many/all... – Andy Hayden Aug 29 '12 at 11:08
1

That is why I said 'could'... i don't really know! AFAIK most methods for determination of dominant eigenvalues are evolutions of the good old power iteration method, meaning that they need to do A*x many times, and with N=5000 and A full this may be not ideal. In any case OP was asking for what was available in numpy/scipy, and it turns out that he can choose between 2 methods. – Giuliano Aug 29 '12 at 11:46
And I think the answer is: sparse routine *is* faster in the OP's case! :) – Andy Hayden Aug 29 '12 at 12:01
With such a small margin is not said, other considerations may play a role, e.g. the specific BLAS/LINPACK implementation available on the machine. In my case, with k=10 i get the same ratio as you between cpu times (on a 2008 MacBookPro) but it is four times slower. But if OP has a blazing fast BLAS/LINPACK, he may even not notice the difference... – Giuliano Aug 29 '12 at 12:24
On my laptop it turns out the non-sparse method was fastest. – Anthony Bak Aug 30 '12 at 17:11
BLAS / LAPACK can be twice as fast with float32 as float64, but "Single-precision types in (arpack) `eigs` and `eighs` are not supported currently." -- Mac OSX. – denis Sep 03 '12 at 09:19

Fastest way to compute k largest eigenvalues and corresponding eigenvectors with numpy

2 Answers2