13

Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros?

The following code runs quite slowly:

A = A.tolil()
A[indices, :] = 0
A = A.tocsr()

I had to convert to scipy.sparse.lil_matrix because the CSR format seems to support neither fancy indexing nor setting values to slices.

Ashwin Srinath
  • 178
  • 1
  • 6
  • Well, I just tried a `[A.__setitem__((i, j), 0) for i in indices for j in range(A.shape[1])]` and `SciPy` told me that `SparseEfficiencyWarning: changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.`... – Pierre GM Aug 26 '12 at 12:36
  • no idea if scipy has any support for it, but as it is a CSR matrix, this can be efficiently handled (by hand at least). One question is, do you want to change the sparsity pattern, or should those 0s be just numerically 0? – seberg Aug 26 '12 at 12:38
  • I'm not sure what is meant by the sparsity pattern. I proceed to solve a system of equations by using the scipy.sparse.linalg.spsolve function. I hope this establishes the need to change the sparsity pattern, or lack thereof. – Ashwin Srinath Aug 26 '12 at 12:52
  • @AshwinSrinath I posted an answer, I guess you probably do not care about it. It could _maybe_ be intersting if you interface to a solver relatively low level, and this is like a Jacobian returned for every iteration that is just modified, then the solver might expect the sparsity pattern not to change. Read the wikipedia article, but I think you _should_ change it (to save space and calculation). – seberg Aug 26 '12 at 12:57

4 Answers4

11

I guess scipy just does not implement it, but the CSR format would support this quite well, please read the wikipedia article on "Sparse matrix" about what indptr, etc. are:

# A.indptr is an array, one for each row (+1 for the nnz):

def csr_row_set_nz_to_val(csr, row, value=0):
    """Set all nonzero elements (elements currently in the sparsity pattern)
    to the given value. Useful to set to 0 mostly.
    """
    if not isinstance(csr, scipy.sparse.csr_matrix):
        raise ValueError('Matrix given must be of CSR format.')
    csr.data[csr.indptr[row]:csr.indptr[row+1]] = value

# Now you can just do:
for row in indices:
    csr_row_set_nz_to_val(A, row, 0)

# And to remove zeros from the sparsity pattern:
A.eliminate_zeros()

Of course this removes 0s that were set from another place with eliminate_zeros from the sparsity pattern. If you want to do that (at this point) depends on what you are doing really, ie. elimination might make sense to delay until all other calculations that might add new zero's are done as well, or in some cases you may have 0 values, that you want to change again later, so it would be very bad to eliminate them!

You could in principle of course short-circuit the eliminate_zeros and prune, but that should be a lot of hassle, and might be even slower (because you won't do it in C).


Details about eliminiate_zeros (and prune)

The sparse matrix, does generally not save zero elements, but just stores where the nonzero elements are (roughly and with various methods). eliminate_zeros removes all zeros in your matrix from the sparsity pattern (ie. there is no value stored for that position, when before there was a vlaue stored, but it was 0). Eliminate is bad if you want to change a 0 to a different value lateron, otherwise, it saves space.

Prune would just shrink the data arrays stored when they are longer then necessary. Note that while I first had A.prune() in there, A.eliminiate_zeros() already includes prune.

seberg
  • 8,785
  • 2
  • 31
  • 30
  • Thanks! That sped things up considerably! I'd just like to know what the eliminate_zeros and prune statements are doing there? – Ashwin Srinath Aug 26 '12 at 13:07
  • Added a (hopefully understandable) sentence. Note that `prune()` was unnecessary, `eliminate_zeros` already calls `prune` – seberg Aug 26 '12 at 13:22
2

You can use matrix dot product to achieve that zeroing. Since the matrix we'll use is very sparse (diagonal with zeros for the rows/columns we which to zero out), multiplication should be efficient.

You'll need one of the following functions:

import scipy.sparse

def zero_rows(M, rows):
    diag = scipy.sparse.eye(M.shape[0]).tolil()
    for r in rows:
        diag[r, r] = 0
    return diag.dot(M)

def zero_columns(M, columns):
    diag = scipy.sparse.eye(M.shape[1]).tolil()
    for c in columns:
        diag[c, c] = 0
    return M.dot(diag)

Usage example:

>>> A = scipy.sparse.csr_matrix([[1,0,3,4], [5,6,0,8], [9,10,11,0]])
>>> A
<3x4 sparse matrix of type '<class 'numpy.int64'>'
        with 9 stored elements in Compressed Sparse Row format>
>>> A.toarray()
array([[ 1,  0,  3,  4],
       [ 5,  6,  0,  8],
       [ 9, 10, 11,  0]], dtype=int64)

>>> B = zero_rows(A, [1])
>>> B
<3x4 sparse matrix of type '<class 'numpy.float64'>'
        with 6 stored elements in Compressed Sparse Row format>
>>> B.toarray()
array([[  1.,   0.,   3.,   4.],
       [  0.,   0.,   0.,   0.],
       [  9.,  10.,  11.,   0.]])

>>> C = zero_columns(A, [1, 3])
>>> C
<3x4 sparse matrix of type '<class 'numpy.float64'>'
        with 5 stored elements in Compressed Sparse Row format>
>>> C.toarray()
array([[  1.,   0.,   3.,   0.],
       [  5.,   0.,   0.,   0.],
       [  9.,   0.,  11.,   0.]])
dubek
  • 11,447
  • 5
  • 30
  • 23
0

Update to the latest version of scipy. It supports fancy indexing.

Rose Perrone
  • 61,572
  • 58
  • 208
  • 243
0

I would like to complete the answer given by @seberg. If one wants to set nnz values to zero, then one should modify the structure of the CSR matrix instead of just modifying .data attribute.

Current behavior of this code is,

>>> import scipy.sparse
>>> import numpy as np
>>> A = scipy.sparse.csr_matrix([[0,1,0], [2,0,3], [0,0,0], [4,0,0]])
>>> A.toarray()
array([[0, 1, 0],
       [2, 0, 3],
       [0, 0, 0],
       [4, 0, 0]], dtype=int64)
>>> csr_row_set_nz_to_val(A, 1)
>>> A.toarray()
array([[0, 1, 0],
       [0, 0, 0],
       [0, 0, 0],
       [4, 0, 0]], dtype=int64)
>>> A.data
array([1, 0, 0, 4], dtype=int64)
>>> A.indices
array([1, 0, 2, 0], dtype=int32)
>>> A.indptr
array([0, 1, 3, 3, 4], dtype=int32)

Because we are dealing with sparse matrices, we do not want zeros in the A.data array. I think one should modify the csr_row_set_nz_to_val as follows

def csr_row_set_nz_to_val(csr, row, value=0):
    """Set all nonzero elements of a CSR matrix M (elements currently in the sparsity pattern)
    to the given value. Useful to set to 0 mostly.
    """
    if not isinstance(csr, scipy.sparse.csr_matrix):
        raise ValueError("Matrix given must be of CSR format.")
    if value == 0:
        csr.data = np.delete(csr.data, range(csr.indptr[row], csr.indptr[row+1])) # drop nnz values
        csr.indices = np.delete(csr.indices, range(csr.indptr[row], csr.indptr[row+1])) # drop nnz column indices
        csr.indptr[(row+1):] = csr.indptr[(row+1):] - (csr.indptr[row+1] - csr.indptr[row])
    else:
        csr.data[csr.indptr[row]:csr.indptr[row+1]] = value # replace nnz values by another nnz value

Finally, we would get instead

>>> A = scipy.sparse.csr_matrix([[0,1,0], [2,0,3], [0,0,0], [4,0,0]])
>>> csr_row_set_nz_to_val(A, 1)
>>> A.toarray()
array([[0, 1, 0],
       [0, 0, 0],
       [0, 0, 0],
       [4, 0, 0]], dtype=int64)
>>> A.data
array([1, 4], dtype=int64)
>>> A.indices
array([1, 0], dtype=int32)
>>> A.indptr
array([0, 1, 1, 1, 2], dtype=int32)
ngazagna
  • 61
  • 1
  • 6