3

I'm looking for simple sparse vector implementation that can be mapped into memory, similarly to numpy.memmap.

Unfortunately, numpy implementation deals only with full vector. Example usage:

vec = SparseVector('/tmp/file.dat')  # SparseVector is the class I'm looking for
vec[10] = 10
vec[50] = 21

for key in vec:
    print vec[key]    # 10, 21

I foung scipy class representing sparse matrix, however 2 dimensions are clumsy to use as I'd need to make matrix with only one row a then use vec[0,i].

Any suggestions?

petrbel
  • 2,428
  • 5
  • 29
  • 49
  • Where is `SparseVector` from? Is this something to do with Apache Spark? If so you should tag your question with this information and update the text. – YXD Apr 11 '15 at 09:26
  • 2
    no, I mean that's the class I'm looking for (don't know name yet), sorry for misunderstanding, I'll make edit asap – petrbel Apr 11 '15 at 09:36

1 Answers1

0

Someone else was just asking about 1d sparse vectors, only they wanted to take advantage of the scipy.sparse method of handling duplicate indices.

is there something like coo_matrix but for sparse vectors?

As shown there, a coo_matrix actually consists of 3 numpy arrays, data, row, col. Other formats rearrange the values in other ways, lil for example has 2 nested lists, one for the data, another for the coordinates. dok is a regular dictionary, with (i,j) tuples as keys.

In theory then a sparse vector will require 2 arrays. Or as your example shows it could be a simple dictionary.

So you could implement a mmap sparse vector by using two mmap arrays. As far as I know there isn't a mmap version of the scipy sparse matrices, though it's not something I've looked for.

But what functionality do you want? What dimension? So large that a dense version would not fit in regular memory? Are you doing math with it? Or just data lookup?

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353