2

I am trying to implement spectral clustering algorithm for a community detection in graph problem.

I have very huge matrix to calculates its Eigenvectors, matrix of > 1Mx1M.

Numpy and Scipy needs the matrix to be on memory to calculate it, which is impossible in my case.

Is there any other lib or package that calculates Eigenvectors and values on disk instead of memory (just like HDF5 allows us to store and manipulate data on disk)?

Or is there any solution you can suggest?

5gon12eder
  • 24,280
  • 5
  • 45
  • 92
  • 3
    Is your matrix sparse (https://en.wikipedia.org/wiki/Sparse_matrix)? I hope so! – Warren Weckesser Jan 06 '16 at 22:29
  • http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.sparse.linalg.eigs.html – brainjam Jan 06 '16 at 22:30
  • no it's not sparse :( that's the problem. – MeNoureddine Jan 06 '16 at 22:31
  • Thank you @brainjam but this uses RAM too, and my problem is with memory usage, such a matrix can't be loaded on RAM (in my knowledge) – MeNoureddine Jan 06 '16 at 22:36
  • Doesn't your OS have swap space? Have you tried it? If so then what is the error? – Guy Coder Jan 07 '16 at 01:08
  • @GuyCoder yes it has (8Gb), but how can i explicitly use it? there is no specific error but it's consuming a lot of Memory till the OS freezes. (I'm using Linux) – MeNoureddine Jan 07 '16 at 20:42
  • Linux is not my primary OS, but the concept should be same. It is there by default and automatically used. However in rare cases you need to increase the size of the reserved disk space if you run out; that is OS specific. Start with [Using a swap space](http://www.tldp.org/LDP/sag/html/using-swap.html). I do know that on Linux you can also use the resource monitor and on my Ubuntu it does show how much RAM and how much swap space are being used in real time. At this point I can not give more details because it would be guessing and not from actual usage. – Guy Coder Jan 07 '16 at 20:51
  • I just calculated 1M x 1M and realized you will need 1T to hold that. I have never seen a swap space that large, but it should work in theory. – Guy Coder Jan 07 '16 at 20:54
  • I am getting MemoryError, `>>> a = np.random.randint(0, 2, (100000,100000)) Traceback (most recent call last): File "", line 1, in File "mtrand.pyx", line 953, in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:10897) MemoryError ` although the swap is free (90%). – MeNoureddine Jan 07 '16 at 21:09
  • oups! yeah that was a bad exemple to test on cauz it needed more than RAM+SWAP that i have, so yes @GuyCoder i think I should increase the SWAP size then try again. Thank you :) – MeNoureddine Jan 07 '16 at 21:26

1 Answers1

2

Increase the size of your swap file.

See: What is virtual memory?
Creating a swap space
Using a swap space

Also systems typically report on in real time in the resource monitor.

For Ubuntu

[Example of resource monitor showing swap space usage]

Guy Coder
  • 24,501
  • 8
  • 71
  • 136