2

I already read this question: memory error in numpy svd and this Applying SVD throws a Memory Error instantaneously? and a bunch of other numpy.linalg.svd questions.

I need to run svd on very large matrices for work. I currently have a 8 gig machine and it sometimes (on certain matrices) crashes when I run it on certain scenarios by taking up all the systems memory and making the computer grind to a halt.

I need to analyze the svd results so I could learn about the clusters model. How can I predict when will it crash and when will it work? Or perhaps how much memory is needed for it to work properly?

Community
  • 1
  • 1
AturSams
  • 7,568
  • 18
  • 64
  • 98
  • Have you checked the obvious requirements of >= `NxM` + (`NxN` + `MxM` + `N`) float values for a full svd? This can already need a lot of memory for larger matrices. – cel Feb 24 '15 at 08:05
  • Can I avoid this issue by using gensim? Do I not avoid it already by doing `full_matrices=false`? – AturSams Feb 24 '15 at 13:13
  • I don't see why you call that an `issue`. The definition of a full SVD is the decomposition of a matrix in matrices of these sizes. That's what you actually want to compute... If you don't calculate the full SVD, you get smaller matrices, but you can again calculate the amount of memory you need just to keep the input and the output in memory at the same time. This is again a lower bound on the required memory – cel Feb 24 '15 at 13:31
  • @cel Where did you derive these requirement from (NXN + MXM)? I read [here](ftp://ftp.nist.gov/pub/mel/michalos/Software/Optimization/dlib-18.9/docs/dlib/matrix/lapack/gesvd.h.html) that the requirements are more along the lines of Max(N, M)? – AturSams Feb 24 '15 at 13:35
  • I think my requirements are the requirements of a full `SVD` and the `max(N,M)` are the requirements for a truncated `SVD`. But I may be wrong. – cel Feb 24 '15 at 13:38
  • 1
    According to [this](http://fa.bianp.net/blog/2012/singular-value-decomposition-in-scipy/) you are likely right. I think that the matrices we're using might actually need 30 gigs of memory to compute. We probably should use ARPACK instead of LAPACK.:) – AturSams Feb 24 '15 at 13:43
  • How large is 'very large'? Could you tell us the actual dimensions and dtypes of the arrays you are trying to decompose? Are they sparse? Do you really need the full SVD, or are you only interested in a subset of the singular values/vectors, e.g. the largest ones? – ali_m Feb 24 '15 at 18:36
  • 9200 * 31k * float (32bit). It is fairly spare (a lot of epsilons). I am considering gensim – AturSams Feb 25 '15 at 12:12

0 Answers0