4

I am trying to find the largest eigenvalue of an incredibly sparse adjacency matrix. I have tried using all the approaches I see available:

mat = scipy.io.mmread(f)
mat = scipy.sparse.csr_matrix(mat)
G = nx.to_networkx_graph(mat)
mat = None

# compute largest eigenvalue
L = nx.normalized_laplacian_matrix(G)

# impl 1
e = numpy.linalg.eigvals(L.A)
# impl 2
e, _ = scipy.sparse.linalg.eigs(L.A, k=1, which='LA')
# impl 3
e, _ = scipy.sparse.linalg.eigs(L.A)

All three of these implementations at some point encounter a similar memory error:

 e, _ = scipy.sparse.linalg.eigs(L.A)
 File "/usr/lib64/python3.7/site-packages/scipy/sparse/base.py", line 674, in __getattr__
return self.toarray()
File "/usr/lib64/python3.7/site-packages/scipy/sparse/compressed.py", line 947, in toarray
out = self._process_toarray_args(order, out)
File "/usr/lib64/python3.7/site-packages/scipy/sparse/base.py", line 1184, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /usr/lib64/python3.7/site packages/scipy/sparse/base.py(1184)_process_toarray_args()
-> return np.zeros(self.shape, dtype=self.dtype, order=order)

(Pdb) print(self.shape)
(14259278, 14259278)

after trying to generate a 1.6PB numpy array, presumably for the dense representation of the matrix. Clearly, I do not have the memory for this. I do have quite a lot (128GB). Is there some implementation or alternative that does not require generating the dense matrix? It does not have to be Python.

tnallen
  • 49
  • 2
  • Why do you keep using `.A`? – user2357112 Nov 28 '18 at 00:43
  • A [quick](https://math.stackexchange.com/q/4368/13983) look [around](https://scicomp.stackexchange.com/questions/24999/compute-all-eigenvalues-of-a-very-big-and-very-sparse-adjacency-matrix) suggests you [might](https://mathoverflow.net/questions/38283/computing-the-largest-eigenvalue-of-a-very-large-sparse-matrix) have a hard road ahead of you. Previous discussion suggests that Arnoldi iteration (in Scipy [here](https://docs.scipy.org/doc/scipy/reference/tutorial/arpack.html)) is the needed algorithm, though you may have to roll your own lazy loading routine. – tel Nov 28 '18 at 00:45
  • `L.A` produces a dense, hence very large, numpy array. `numpy.linalg.eigvals` most likely needs that, but do the others? – hpaulj Nov 28 '18 at 00:46

2 Answers2

4

The only reason SciPy is trying to create a dense representation is because you specifically requested one:

L.A

Stop doing that. scipy.sparse.linalg.eigs takes a sparse matrix. You don't need the dense array .A produces. Also, 'LA' isn't one of the allowed values of which in the docs; you probably wanted 'LM' (the default).

user2357112
  • 260,549
  • 28
  • 431
  • 505
2

Instead of using networkx use scipy.sparse.csgraph.laplacian(..., normed=True). As others have noted the L.A is giving you a dense array.

keithpjolley
  • 2,089
  • 1
  • 17
  • 20