I want to parallelize the following problem. Given an array w
with shape (dim1,)
and a matrix A
with shape (dim1, dim2)
, I want each row of A
to be multiplied for the corresponding element of w
.
That's quite trivial.
However, I want to do that for a bunch of arrays w
and finally sum the result. So that, to avoid the for loop, I created the matrix W
with shape (n_samples, dim1)
, and I used the np.einsum
function in the following way:
x = np.einsum('ji, ik -> jik', W, A))
r = x.sum(axis=0)
where the shape of x
is (n_samples, dim1, dim2)
and the final sum has shape (dim1, dim2)
.
I noticed that np.einsum
is quite slow for a large matrix A
. Is there any more efficient way of solving this problem? I also wanted to try with np.tensordot
but maybe this is not the case.
Thank you :-)