Normalize scipy sparse matrix with number of nonzero elements

Question

I want to divide each row of the csr_matrix by the number of non zero entries in that row.

For example : Consider a csr_matrix A:

A = [[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]]
Result = [[3, 0, 0, 2, 0], [1, 6, 0, 3, 0]]

What's the shortest and efficient way to do it ?

Divakar · Accepted Answer · 2018-03-13T19:43:48.040

6

Get the counts with getnnz method and then replicate and divide in-place into its flattened view obtained with data method -

s = A.getnnz(axis=1)
A.data /= np.repeat(s, s)

Inspired by Row Division in Scipy Sparse Matrix 's solution post : Approach #2.

Sample run -

In [15]: from scipy.sparse import csr_matrix

In [16]: A = csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])

In [18]: s = A.getnnz(axis=1)
    ...: A.data /= np.repeat(s, s)

In [19]: A.toarray()
Out[19]: 
array([[3, 0, 0, 2, 0],
       [1, 6, 0, 3, 0]])

Note: To be compatible between Python2 and 3, we might want to use // -

A.data //=  ...

edited Mar 13 '18 at 19:43

answered Mar 13 '18 at 19:19

Divakar

218,885
19
262
358

I ran into an error because `A` is of type `int64` and the `A.data /= np.repeat(s, s)` tries to casting `A`. – Tai Mar 13 '18 at 19:34
@Tai Are you on Python3? – Divakar Mar 13 '18 at 19:34
Yes, on Python 3. – Tai Mar 13 '18 at 19:35

Tai · Answer 2 · 2018-03-13T19:59:00.577

2

Divakar gives an in-place method. My trial creates a new array.

from scipy import sparse
A = sparse.csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])
A.multiply(1.0/(A != 0).sum(axis=1))

We multiply the inverse values of the sum of non-zero parts in each row. Note that one may want to make sure there is no dividing-by-zero errors.

As Divakar pointed out: 1.0, instead of 1, is needed at A.multiply(1.0/...) to be compatible with Python 2.

edited Mar 13 '18 at 19:59

answered Mar 13 '18 at 19:18

Tai

7,684
3
29
49

1

For Python2, the results might be all zeros. So, I guess we need `1.0` division there. – Divakar Mar 13 '18 at 19:44
@Divakar got you! – Tai Mar 13 '18 at 19:45

Normalize scipy sparse matrix with number of nonzero elements

2 Answers2