3

I am performing numpy svd

U, S, V = np.linalg.svd(A) 

shape of A is :

(10000, 10000)

Due to the large size, it gives me memory error :

U, S, V = np.linalg.svd(A, full_matrices=False) # nargout=3
File "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 1319, in svd
    work = zeros((lwork,), t)
    MemoryError

Then how can I find svd for my matrix?

sam
  • 18,509
  • 24
  • 83
  • 116
  • get more ram, or store arrays on disk. – M4rtini Jan 17 '14 at 07:38
  • I have 4 GB RAM already. what is another way? – sam Jan 17 '14 at 07:42
  • 4Gb ram is not that much really, 48Gb is not that expensive, (assuming you have 64bit?) – usethedeathstar Jan 17 '14 at 07:56
  • 1
    What is your OS / Python version? Cause if you're running your code on a 64 bits windows with 32 bits Python, the [memory limit](http://msdn.microsoft.com/en-us/library/aa366778.aspx) is 2GB. Moreover, what is the datatype of `A`? – Vincent Jan 17 '14 at 08:49
  • I am using Ubunti 13 and python version 2.7 with 64 bits. Datatype of A is numpy array. – sam Jan 17 '14 at 08:56
  • 2
    I was asking about `type(A[0,0])`. Let's say it is `int64`. Then A, U and V require `8*10000**2` bytes each. It means that you need at least 2288 MB. Plus the Python Interpreter and other variables. Maybe the `svd` function needs to cache some data as well. Try to reduce the `A` size, make some experiments then you'll see how much swap space you need to add. – Vincent Jan 17 '14 at 09:17
  • Just for what it's worth, `scipy.linalg.svd` is actually a slightly different implementation (different interface to LAPACK functions). With `scipy.linalg.svd` you can slightly reduce the amount of memory required by specifying `scipy.linalg.svd(A, overwrite_a=True)`. However, computing the `svd` is fairly memory-hungry regardless, so this won't help a ton. – Joe Kington Jan 17 '14 at 17:39
  • What is the purpose to get the svd of 10000 by 10000 matrix? Generally, the problem can be solved mathematically. For example, if the matrix 10000 by 10000 is the multiplication of 10000 by x (=A) and x by 10000 (A') where x << 10000, it can be solved SVD by A' x A which is x by x matrix. – emesday Mar 29 '14 at 10:07

2 Answers2

1

Some small tips: Close everything else that is open on your computer. Remove all unnecessary memory hogging things in your program by setting the variables you don't need anymore to None. Say you used a big dict D for some computations earlier but don't need it anymore set D = None. Try initializing your numpy arrays with dtype=np.int32 or dtype=np.float32 to lower memory requirements.

Depending on what you need the SVD for you can also have a look at the scikit-learn package for python, they have support for many decomposition methods such as PCA and SVD together with sparse matrix support.

Ekgren
  • 1,024
  • 1
  • 9
  • 13
0

There is a light implementation of SVD which is called thin-SVD. It is used when your base matrix is approximately low-rank. Considering the dimensions of your matrix, it is highly likely that it is a low-rank matrix since almost all big matrices are low rank according to a paper entitled, "Why are Big Data Matrices Approximately Low Rank?" hence, thin-SVD might solve this problem by not calculating all singular values and their singular vectors. Rather it aims at finding the highest singular values.

To find the corresponding implemetation you can search for: sklearn.decomposition.TruncatedSVD¶