Finding Largest Eigenvalue of Huge Sparse Matrix

Question

I am trying to find the largest eigenvalue of an incredibly sparse adjacency matrix. I have tried using all the approaches I see available:

mat = scipy.io.mmread(f)
mat = scipy.sparse.csr_matrix(mat)
G = nx.to_networkx_graph(mat)
mat = None

# compute largest eigenvalue
L = nx.normalized_laplacian_matrix(G)

# impl 1
e = numpy.linalg.eigvals(L.A)
# impl 2
e, _ = scipy.sparse.linalg.eigs(L.A, k=1, which='LA')
# impl 3
e, _ = scipy.sparse.linalg.eigs(L.A)

All three of these implementations at some point encounter a similar memory error:

 e, _ = scipy.sparse.linalg.eigs(L.A)
 File "/usr/lib64/python3.7/site-packages/scipy/sparse/base.py", line 674, in __getattr__
return self.toarray()
File "/usr/lib64/python3.7/site-packages/scipy/sparse/compressed.py", line 947, in toarray
out = self._process_toarray_args(order, out)
File "/usr/lib64/python3.7/site-packages/scipy/sparse/base.py", line 1184, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /usr/lib64/python3.7/site packages/scipy/sparse/base.py(1184)_process_toarray_args()
-> return np.zeros(self.shape, dtype=self.dtype, order=order)

(Pdb) print(self.shape)
(14259278, 14259278)

after trying to generate a 1.6PB numpy array, presumably for the dense representation of the matrix. Clearly, I do not have the memory for this. I do have quite a lot (128GB). Is there some implementation or alternative that does not require generating the dense matrix? It does not have to be Python.

A [quick](https://math.stackexchange.com/q/4368/13983) look [around](https://scicomp.stackexchange.com/questions/24999/compute-all-eigenvalues-of-a-very-big-and-very-sparse-adjacency-matrix) suggests you [might](https://mathoverflow.net/questions/38283/computing-the-largest-eigenvalue-of-a-very-large-sparse-matrix) have a hard road ahead of you. Previous discussion suggests that Arnoldi iteration (in Scipy [here](https://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html)) is the needed algorithm, though you may have to roll your own lazy loading routine. — tel, Nov 28 '18 at 00:45
`L.A` produces a dense, hence very large, numpy array. `numpy.linalg.eigvals` most likely needs that, but do the others? — hpaulj, Nov 28 '18 at 00:46

score 4 · Answer 1 · answered Nov 28 '18 at 00:49

The only reason SciPy is trying to create a dense representation is because you specifically requested one:

L.A

Stop doing that. scipy.sparse.linalg.eigs takes a sparse matrix. You don't need the dense array .A produces. Also, 'LA' isn't one of the allowed values of which in the docs; you probably wanted 'LM' (the default).

score 2 · Answer 2 · answered Nov 28 '18 at 00:56

2

Instead of using networkx use scipy.sparse.csgraph.laplacian(..., normed=True). As others have noted the L.A is giving you a dense array.

answered Nov 28 '18 at 00:56

keithpjolley

2,089
1
17
20

Finding Largest Eigenvalue of Huge Sparse Matrix

2 Answers2