5

I need an efficient way to row standardize a sparse matrix.

Given

W = matrix([[0, 1, 0, 1, 0, 0, 0, 0, 0],
            [1, 0, 1, 0, 1, 0, 0, 0, 0],
            [0, 1, 0, 0, 0, 1, 0, 0, 0],
            [1, 0, 0, 0, 1, 0, 1, 0, 0],
            [0, 1, 0, 1, 0, 1, 0, 1, 0],
            [0, 0, 1, 0, 1, 0, 0, 0, 1],
            [0, 0, 0, 1, 0, 0, 0, 1, 0],
            [0, 0, 0, 0, 1, 0, 1, 0, 1],
            [0, 0, 0, 0, 0, 1, 0, 1, 0]])
row_sums = W.sum(1)

I need to produce...

W2 = matrix([[0.  , 0.5 , 0.  , 0.5 , 0.  , 0.  , 0.  , 0.  , 0.  ],
             [0.33, 0.  , 0.33, 0.  , 0.33, 0.  , 0.  , 0.  , 0.  ],
             [0.  , 0.5 , 0.  , 0.  , 0.  , 0.5 , 0.  , 0.  , 0.  ],
             [0.33, 0.  , 0.  , 0.  , 0.33, 0.  , 0.33, 0.  , 0.  ],
             [0.  , 0.25, 0.  , 0.25, 0.  , 0.25, 0.  , 0.25, 0.  ],
             [0.  , 0.  , 0.33, 0.  , 0.33, 0.  , 0.  , 0.  , 0.33],
             [0.  , 0.  , 0.  , 0.5 , 0.  , 0.  , 0.  , 0.5 , 0.  ],
             [0.  , 0.  , 0.  , 0.  , 0.33, 0.  , 0.33, 0.  , 0.33],
             [0.  , 0.  , 0.  , 0.  , 0.  , 0.5 , 0.  , 0.5 , 0.  ]]) 

Where,

for i in range(9):
    W2[i] = W[i]/row_sums[i]

I'd like to find a way to do this without loops (i.e. Vectorized) and using Scipy.sparse matrices. W could be as large at 10mil x 10mil.

Charles
  • 1,820
  • 13
  • 16
  • 1
    I just realized if W is dense (a regular numpy matrix). W2 = W/W.sum(1) works fine. But scipy's sparse matrices don't appear to support division. – Charles Dec 02 '11 at 16:28
  • I don't see a way of doing that other than implementing this division in C code and calling from Python. Does the W.sum for sparse matrix works ok? – jsbueno Dec 02 '11 at 16:40
  • Yes, W.sum(1) on the sparse returns a vector of row sums. – Charles Dec 02 '11 at 16:43
  • The values of W2 are always (1./row_sum). Maybe there is an easy way to replace the 1's in W with values from a column vector? – Charles Dec 02 '11 at 16:44
  • Here's a simple way to do this with sklearn: https://stackoverflow.com/questions/12305021/efficient-way-to-normalize-a-scipy-sparse-matrix – Sergey Zakharov Jul 13 '17 at 13:01

1 Answers1

6

with a bit of matrix algebra

>>> cc
<9x9 sparse matrix of type '<type 'numpy.int32'>'
    with 24 stored elements in Compressed Sparse Row format>
>>> ccd = sparse.spdiags(1./cc.sum(1).T, 0, *cc.shape)
>>> ccn = ccd * cc
>>> np.round(ccn.todense(), 2)
array([[ 0.  ,  0.5 ,  0.  ,  0.5 ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.33,  0.  ,  0.33,  0.  ,  0.33,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.5 ,  0.  ,  0.  ,  0.  ,  0.5 ,  0.  ,  0.  ,  0.  ],
       [ 0.33,  0.  ,  0.  ,  0.  ,  0.33,  0.  ,  0.33,  0.  ,  0.  ],
       [ 0.  ,  0.25,  0.  ,  0.25,  0.  ,  0.25,  0.  ,  0.25,  0.  ],
       [ 0.  ,  0.  ,  0.33,  0.  ,  0.33,  0.  ,  0.  ,  0.  ,  0.33],
       [ 0.  ,  0.  ,  0.  ,  0.5 ,  0.  ,  0.  ,  0.  ,  0.5 ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.33,  0.  ,  0.33,  0.  ,  0.33],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.5 ,  0.  ,  0.5 ,  0.  ]])
>>> ccn
<9x9 sparse matrix of type '<type 'numpy.float64'>'
    with 24 stored elements in Compressed Sparse Row format>
Josef
  • 21,998
  • 3
  • 54
  • 67