I am trying to understand solutions to this question here, and while I can just reuse the code I would prefer to know what is happening before I do.
The question is about how to tile a scipy.sparse.csr_matrix
object, and the top answer (by @user3357359) at the time of writing shows how to tile a single row of a matrix across multiple rows as:
from scipy.sparse import csr_matrix
sparse_row = csr_matrix([[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]])
repeat_number = 3
repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) * sparse_row
(I have added the sparse_row
and repeat_number
initialisation to help make things concrete).
If I now convert this to a dense matrix and print as so:
print(f"repeated_row_matrix.todense() = {repeated_row_matrix.todense()}")
This gives output:
repeated_row_matrix.todense() =
[[0 0 0 0 0 1 0 1 1 0 0 0]
[0 0 0 0 0 1 0 1 1 0 0 0]
[0 0 0 0 0 1 0 1 1 0 0 0]]
The operation on the right of the repeated_row_matrix
assignment seems to me to be performing broadcasting. The original sparse_row
has shape (1,12)
, the temporary matrix is a (3,1)
matrix of ones, and the result is a (3,12)
matrix. So far, this is similar behaviour as you would expect from numpy.array
. However, if I try the same thing with the subtraction operator:
sparse_row = csr_matrix([[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]])
repeat_number = 3
repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) - sparse_row
print(f"repeated_row_matrix.todense() =\n{repeated_row_matrix.todense()}")
I get an error in the third line:
3 repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) - sparse_row
...
ValueError: inconsistent shapes
Is this intended behaviour? And if so, why?
I guess that a multiplication between two sparse K-vectors with n1 and n2 non-zeros respectively, would always have less than or equal to min(n1,n2) non-zeros. A subtraction would have in the worst case n1+n2 non-zeros but does this really explain why one behaviour is allowed and one is not.
I wish to perform subtraction of a single row vector from a matrix (for a sparse implementation of K-medoids I am playing with). To perform subtraction, I am creating a temporary sparse array which tiles the original row by using broadcasting with multiplication then I can subtract one array from another. I am sure there should be a better way, but I don't see it.
Also, @"C.J. Jackson" replies in the comments that a better way to construct the tiling is:
sparse_row[np.zeros(repeat_number),:]
This works, but I have no idea why or what functionality is being employed. Can someone point me to the documentation? If sparse_row
were a numpy.array
then this does not cause tiling.
Thanks in advance.