1

Suppose I create this sparse matrix, where the non-zero elements consist of booleans 'true':

s = sparse([3 2 3 3 3 3 2 34 3 6 3 2 3 3 3 3 2 3 3 6], [10235 11470 21211 33322 49297 88361 91470 127422 152383 158751 166485 171471 181211 193321 205548 244609 251470 283673 312384 318752], true);

which contains 20 elements. Matlab ought to allocates no more than (4+4+1)*20 = 180 bytes of memory (it looks like the indices are 4 bytes long). Yet

whos s

says that the matrix takes up 1275112 bytes in memory, which is a problem as I need to store many thousands of these.

Any idea why this happens?

Cheers!

glep
  • 104
  • 9

2 Answers2

5

The memory storage format of a sparse matrix in MATLAB is a dense array of column pointers. Each column pointer points to a list of nonzero elements, and each element needs an index and a value. So the formula is

(max column num) x P + (num nonzero) x (P + S)

where P is pointer size (8 bytes on a 64-bit system, 4 on a 32-bit system), and S is the size of a single element. 1 for logical. For your problem I get 1275108, or "close enough".

So what to do about it? Note the big memory driver: maximum column number, due to the dense array of column pointers. In your case, if you reverse the index order and store the transpose of the matrix, it takes only 236 bytes (on your 32-bit system).

Peter
  • 14,559
  • 35
  • 55
  • That is spot on. Kudos! – glep Dec 03 '13 at 17:51
  • Do you have a link to some official documentation stating that it is dense in column pointers ? – user2987828 Dec 03 '13 at 21:16
  • http://www.mathworks.com/help/pdf_doc/otherdocs/simax.pdf, linked from http://www.mathworks.com/help/matlab/math/accessing-sparse-matrices.html. And it looks like I was right about the size, but technically the column pointers are not pointers... they're integer indices into two large arrays: one for indices, another for values. The result is the same. On 64-bit systems, index integers are 64-bit. – Peter Dec 03 '13 at 21:45
  • The document also shows where that missing 4 bytes comes from: the column index array ends with one extra value: the value of `nnz` (num nonzero) – Peter Dec 03 '13 at 21:47
  • 1
    In [Octave, sparse matrix storage](http://www.gnu.org/software/octave/doc/interpreter/Storage-of-Sparse-Matrices.html) (empirically) appears to store 4-byte values (presume 32-bit offsets as in [Yale format](http://en.wikipedia.org/wiki/Sparse_matrix#Yale_format)) even on my x86_64 machine (Octave also claims to be configured for x86_64). A quick test suggests that sparse matrix dimensions may be limited to `2^31-1` (max int32). Just an aside for others who land here from Search. Thanks for getting me vectored in the right direction. – hoc_age Jun 23 '14 at 18:43
  • Also Octave let you choose the dense dimensions, whereas with Matlab it has to be columns. – glep Aug 18 '14 at 16:15
1

I do not know why this happens but I do know how to fix it. Here is some code revealing that a sparse matrix is linearly dependent on the number of columns.

for k = 1:6
  n = 10^k; 
  a = sparse(n, 100); % keep number of columns constant
  tmp = whos('a'); 
  fprintf('%1.0f bytes used\n', tmp.bytes); 
end

which produces

416 bytes used
416 bytes used
416 bytes used
416 bytes used
416 bytes used
416 bytes used

while keeping the number of rows constant with a = sparse(n, 100); instead gives

     56 bytes used
    416 bytes used
   4016 bytes used
  40016 bytes used
 400016 bytes used
4000016 bytes used

thus to optimize your code change s from a 34 x 318752 to a 318752 x 34 matrix by swapping the first two inputs of sparse.

Strategy Thinker
  • 343
  • 1
  • 3
  • 15