Looping over large sparse array

Question

Let's say I have a (sparse) matrix M size (N*N, N*N). I want to select elements from this matrix where the outer product of grid (a (n,m) array, where n*m=N) is True (it is a boolean 2D array, and na=grid.sum()). This can be done as follows

result = M[np.outer( grid.flatten(),grid.flatten() )].reshape (( N, N ) )

result is an (na,na) sparse array (and na < N). The previous line is what I want to achieve: get the elements of M that are true from the product of grid, and squeeze the ones that aren't true out of the array.

As n and m (and hence N) grow, and M and result are sparse matrices, I am not able to do this efficiently in terms of memory or speed. Closest I have tried is:

result = sp.lil_matrix ( (1, N*N), dtype=np.float32 )
# Calculate outer product
A = np.einsum("i,j", grid.flatten(), grid.flatten())  
cntr = 0
it = np.nditer ( A, flags=['multi_index'] )
while not it.finished:
    if it[0]:
        result[0,cntr] = M[it.multi_index[0], it.multi_index[1]]
        cntr += 1
# reshape result to be a N*N sparse matrix

The last reshape could be done by this approach, but I haven't got there yet, as the while loop is taking forever.

I have also tried selecting nonzero elements of A too, and looping over but this eats up all of the memory:

A=np.einsum("i,j", grid.flatten(), grid.flatten())  
nzero = A.nonzero() # This eats lots of memory
cntr = 0
for (i,j) in zip (*nzero):
    temp_mat[0,cntr] = M[i,j]
    cnt += 1

'n' and 'm' in the example above are around 300.

I added `sparse` to the title because indexing speeds are slower than for regular `numpy` arrays. — hpaulj, Jul 22 '16 at 22:30
While `lil` is best for indexed assignment and lookup, it is usually better to manipulate the `coo` style of input arrays directly, and build the sparse matrix after. Look at `sparse.bmat` to see how `sparse` ifself uses the `coo` format to construct a complex matrix. — hpaulj, Jul 22 '16 at 23:20
Even with `N=4`, your `it` iteration takes for ever - literally. You are missing a `next(it)`. — hpaulj, Jul 23 '16 at 01:15

score 1 · Answer 1 · answered Jul 23 '16 at 05:45

I don't know if it was a typo, or code error, but your example is missing an iternext:

R=[]
it = np.nditer ( A, flags=['multi_index'] )
while not it.finished:
    if it[0]:
        R.append(M[it.multi_index])
    it.iternext()

I think appending to a list is simpler and faster than R[ctnr]=.... It's competitive if R is a regular array, and sparse indexing is slower (even the fastest lil format).

ndindex wraps this use of a nditer as:

R=[]
for index in np.ndindex(A.shape):
    if A[index]:
        R.append(M[index])

ndenumerate also works:

R = []
for index,a in np.ndenumerate(A):
   if a:
       R.append(M[index])

But I wonder if you really want to advance the cntr each it step, not just the True cases. Otherwise reshaping result to (N,N) doesn't make much sense. But in that case, isn't your problem just

M[:N, :N].multiply(A)

or if M was a dense array:

M[:N, :N]*A

In fact if both M and A are sparse, then the .data attribute of that multiply will be the same as the R list.

In [76]: N=4
In [77]: M=np.arange(N*N*N*N).reshape(N*N,N*N)
In [80]: a=np.array([0,1,0,1])
In [81]: A=np.einsum('i,j',a,a)
In [82]: A
Out[82]: 
array([[0, 0, 0, 0],
       [0, 1, 0, 1],
       [0, 0, 0, 0],
       [0, 1, 0, 1]])

In [83]: M[:N, :N]*A
Out[83]: 
array([[ 0,  0,  0,  0],
       [ 0, 17,  0, 19],
       [ 0,  0,  0,  0],
       [ 0, 49,  0, 51]])

In [84]: c=sparse.csr_matrix(M)[:N,:N].multiply(sparse.csr_matrix(A))
In [85]: c.data
Out[85]: array([17, 19, 49, 51], dtype=int32)

In [89]: [M[index] for index, a in np.ndenumerate(A) if a]
Out[89]: [17, 19, 49, 51]

Thanks for this, I think I didn't quite explain properly what I needed, but it's replicating the top line of code. I have changed the question to reflect this. I'm pretty sure my attempts don't solve my problem very well... — Jose, Jul 23 '16 at 20:15

Looping over large sparse array

1 Answers1