1

I want to divide each row of the csr_matrix by the number of non zero entries in that row.

For example : Consider a csr_matrix A:

A = [[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]]
Result = [[3, 0, 0, 2, 0], [1, 6, 0, 3, 0]]

What's the shortest and efficient way to do it ?

user3787291
  • 217
  • 3
  • 12

2 Answers2

6

Get the counts with getnnz method and then replicate and divide in-place into its flattened view obtained with data method -

s = A.getnnz(axis=1)
A.data /= np.repeat(s, s)

Inspired by Row Division in Scipy Sparse Matrix 's solution post : Approach #2.

Sample run -

In [15]: from scipy.sparse import csr_matrix

In [16]: A = csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])

In [18]: s = A.getnnz(axis=1)
    ...: A.data /= np.repeat(s, s)

In [19]: A.toarray()
Out[19]: 
array([[3, 0, 0, 2, 0],
       [1, 6, 0, 3, 0]])

Note: To be compatible between Python2 and 3, we might want to use // -

A.data //=  ...
Divakar
  • 218,885
  • 19
  • 262
  • 358
2

Divakar gives an in-place method. My trial creates a new array.

from scipy import sparse
A = sparse.csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])
A.multiply(1.0/(A != 0).sum(axis=1)) 

We multiply the inverse values of the sum of non-zero parts in each row. Note that one may want to make sure there is no dividing-by-zero errors.

As Divakar pointed out: 1.0, instead of 1, is needed at A.multiply(1.0/...) to be compatible with Python 2.

Tai
  • 7,684
  • 3
  • 29
  • 49