I am wondering what is the fastest way, given a shape (n, m) numpy array and a shape (p) numpy array describing a partition of the range 0 to n (for example, one such partition for n=6 would be: [0, 2, 4], meaning the indices are partitioned as (0, 1), (2,3), (4,5)), to return a numpy array of shape (p, m) of the rows corresponding to each partition summed together.
For example,
[[0,1,1,1],
[2,0,1,1],
[0,0,0,1],
[5,1,0,0]]
given the partition [0,1]
should return
[[0,1,1,1],
[7,1,1,2]]
I already have a solution which is constructing the matrix
[[1,0,0,0],
[0,1,1,1]]
and left multiplying the initial matrix by this to get the desired matrix, which I think should be pretty fast, but I think there might be something faster involving something similar to numpy.reduceat (https://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduceat.html) using the partition array. Any help?
Wait... I just read the reduceat documentation and you can literally just do np.add.reduceat(matrix, partition, axis=0)
. I remember thinking you couldn't do this. I think this is because for my application, I needed to do this for a sparse matrix. So could anyone advise on how to do this when the input 2d numpy array is in sparse format?