0

I have an numpy array V with a shape of (1, 1000). I also have a csr_matrix M with a shape of (100000, 1000). For every row m in M, I want to compute the pairwise minimum between V and m, and store all the results in a new matrix, and I want to do it efficiently. The final result should also be a matrix with a shape of (100000, 1000).

Some approaches that I considered/tried:

  • Iterate over each row of M with a for loop. This works, but it is quite slow.
  • Convert M to a matrix: numpy.minimum(V, M.toarray()) which takes a huge amount of memory.
  • numpy.minimum(V, M) does not work. I get an error which says: Comparing a sparse matrix with a scalar less than zero using >= is inefficient.

What would be a good and efficient way to do this without taking too much memory or time?

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
happyhuman
  • 1,541
  • 1
  • 16
  • 30

1 Answers1

2

If the values in v are nonnegative, this is a concise method that should be much faster than looping over each row:

import numpy as np
from scipy.sparse import csr_matrix

def rowmin(M, v):
    # M must be a csr_matrix, and v must be a 1-d numpy array with
    # length M.shape[1].  The values in v must be nonnegative.
    if np.any(v < 0):
        raise ValueError('v must not contain negative values.')

    # B is a CSR matrix with the same sparsity pattern as M, but its
    # data values are from v:
    B = csr_matrix((v[M.indices], M.indices, M.indptr))
    return M.minimum(B)

To allow negative values in v, this modification works, but a warning is generated when v has negative values, because the sparsity pattern in B has to be changed when the negative values are copied into it. (The warning could be silenced with a couple more lines of code.) Many negative values in v will probably degrade the performance significantly.

def rowmin(M, v):
    # M must be a csr_matrix, and v must be a 1-d numpy array with
    # length M.shape[1].

    # B is a CSR matrix with the same sparsity pattern as M, but its
    # data values are from v:
    B = csr_matrix((v[M.indices], M.indices, M.indptr))

    # If there are negative values in v, include them in B.
    negmask = v < 0
    if np.any(negmask):
        negindices = negmask.nonzero()[0]
        B[:, negindices] = v[negindices]

    return M.minimum(B)
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214