1

I'm trying to create a sparse vector from a series of arrays where there are some overlapping indexes. For a matrix there's a very convenient object in scipy that does exactly this:

coo_matrix((data, (i, j)), [shape=(M, N)])

So if data happens to have repeated elements (because their i,j indexes are the same), those are summed up in the final sparse matrix. I was wondering if it would be possible to do something similar but for sparse vectors, or do I have just to use this object and pretend it's a 1-column matrix?

aaragon
  • 2,314
  • 4
  • 26
  • 60

1 Answers1

1

While you might be able to reproduce a 1d equivalent, it would save a lot of work to just work with a 1 row (or 1 col) sparse matrix. I am not aware of any sparse vector package for numpy.

The coo format stores the input arrays exactly as you given them, without the summing. The summing is done when it is displayed or (otherwise) converted to a csc or csr format. And since the csr constructor is compiled, it will to that summation faster than anything you could code in Python.

Construct a '1d' sparse coo matrix

In [67]: data=[10,11,12,14,15,16]    
In [68]: col=[1,2,1,5,7,5]
In [70]: M=sparse.coo_matrix((data (np.zeros(len(col)),col)),shape=(1,10))

Look at its data representation (no summation)

In [71]: M.data
Out[71]: array([10, 11, 12, 14, 15, 16])
In [72]: M.row
Out[72]: array([0, 0, 0, 0, 0, 0])
In [73]: M.col
Out[73]: array([1, 2, 1, 5, 7, 5])

look at the array representation (shape (1,10))

In [74]: M.A
Out[74]: array([[ 0, 22, 11,  0,  0, 30,  0, 15,  0,  0]])

and the csr equivalent.

In [75]: M1=M.tocsr()
In [76]: M1.data
Out[76]: array([22, 11, 30, 15])
In [77]: M1.indices
Out[77]: array([1, 2, 5, 7])
In [78]: M1.indptr
Out[78]: array([0, 4])

In [79]: np.nonzero(M.A)
Out[79]: (array([0, 0, 0, 0]), array([1, 2, 5, 7]))

nonzero shows the same pattern:

In [80]: M.nonzero()
Out[80]: (array([0, 0, 0, 0, 0, 0]), array([1, 2, 1, 5, 7, 5]))

In [81]: M.tocsr().nonzero()
Out[81]: (array([0, 0, 0, 0]), array([1, 2, 5, 7]))

In [82]: np.nonzero(M.A)
Out[82]: (array([0, 0, 0, 0]), array([1, 2, 5, 7]))

M.toarray().flatten() will give you the (10,) 1d array.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • So I followed your example, and the final array is obtained from M.data, correct? This would be the equivalent to a 1-dimensional np.array. – aaragon Apr 13 '15 at 11:36
  • `M.data` is a 1d array, but it just has the nonzero values (or what you gave it via `coo_matrix`). You need to include `M.col` to know where the zeros are (or aren't). – hpaulj Apr 13 '15 at 16:04