Repeat a csr_matrix row over axis 0 to create a matrix

Question

I have a CSR formatted sparse matrix (scipy.sparse.csr_matrix) with around 100,000 rows and 10,000 columns. The rows represent users, and the columns represent items, and the values in the matrix, the rating for that user and item.

I am trying to calculate correlation between two users. So I am looping over each user (call it user_a), and doing matrix operations to get the correlation of user_a against all other users.

The first step, is to generate the current user matrix. This matrix contains the elements of the current user, masked to match the common elements of user_a with all other users.

My code at the moment is:

# ratings is the big original matrix
R = ratings.getrow(user_id)
user_matrix = sparse.csr_matrix(R)
user_matrix = user_matrix[numpy.array([0]).repeat(ratings.shape[0]),:]
user_matrix = user_matrix.multiply(ratings.astype(numpy.bool))

(https://stackoverflow.com/a/25342156/947194)

But these lines take 4 seconds for a user with just 500 items. And I need to run it for each user (100,000 times). So it is a bit slow.

I tried generating user_matrix using vstack, but it took 7 seconds

Is there a way to reduce a bit more the time of these lines?

IDK if this will help, just to give you some food for thought: [broadcasting](http://www.scipy-lectures.org/intro/numpy/operations.html#broadcasting) — iled, Dec 11 '15 at 14:28

score 1 · Accepted Answer · answered Dec 11 '15 at 15:19

1

For a csr_matrix ratings and an integer user_id, this gives the same result as your code:

valid_ratings = ratings.astype(bool)
user_matrix = valid_ratings.multiply(ratings[user_id])

But it won't work if your version of scipy is too old. I don't recall which version of scipy extended the broadcasting behavior of sparse matrices to make this work.

answered Dec 11 '15 at 15:19

Warren Weckesser

110,654
19
194
214

That's a lot better! It now takes 0.3 seconds. I was just trying to solve the problem the wrong way. Thank you! – markmb Dec 11 '15 at 15:44

Repeat a csr_matrix row over axis 0 to create a matrix

1 Answers1