5

For the purpose I used the solution from that thread link by now, however it gives memory error as expected since my matrix A size is 6 million to 40000 matrix. Therefore I am looking for any other solution nevertheless to approximate the correlation matrix. How can I vaccinate that problem? Any help is appreciated.

Community
  • 1
  • 1
erogol
  • 13,156
  • 33
  • 101
  • 155
  • First, you need to answer the following question: how many nonzero elements do you have in your matrix. Call this number `nnz`. The memory required to store them is about `16e-9*nnz` gigabytes. How many gigabytes would you need? – pv. Nov 28 '13 at 21:25

1 Answers1

1

Your problem is that you can't hold the result in memory (6e6^2 values?).

You can drop rows from the original matrix. If, for example, you are searching for highly correlated rows, you may want to cluster the rows, in order to break the problem.

You can also use scipy.sparse.linalg.svds to shrink the number of columns. But you will still have to handle rows^2 correlations.

cyborg
  • 9,989
  • 4
  • 38
  • 56