NumPy tensordot grouped calculation

Question

suppose I have two arrays:

import numpy as np
a=np.array([[1,2],
            [3,4]])

b=np.array([[1,2],
            [3,4]])

and I want to element-wise multiply the arrays then sum the elements, i.e. 1*1 + 2*2 + 3*3 + 4*4 = 30, I can use:

np.tensordot(a, b, axes=((-2,-1),(-2,-1)))
>>> array(30)

Now, suppose arrays a and b are 2-by-2-by-2 arrays:

a=np.array([[[1, 2],
             [3, 4]],

            [[5, 6],
             [7, 8]]])

b=np.array([[[1, 2],
             [3, 4]],

            [[5, 6],
             [7, 8]]])

and I want to do the same operation for each group, i.e. [[1,2],[3,4]] times with [[1,2],[3,4]] then sums the elements, and the same with [[5,6],[7,8]]. The result should be array([ 30, 174]), where 30 = 1*1 + 2*2 + 3*3 + 4*4 and 174 = 5*5 + 6*6 + 7*7 + 8*8. Is there a way to do that using tensordot?

P.S.
I understand in this case you can simply use sum or einsum:

np.sum(a*b,axis=(-2,-1))
>>> array([ 30, 174])

np.einsum('ijk,ijk->i',a,b)
>>> array([ 30, 174])

but this is merely a simplified example, I need to use tensordot because it's faster.

Thanks for any help!!

I don't see it as a matrix multiplication? It's just an element-wise multiplication then sum all elements. — Sam-gege, Jan 27 '22 at 12:54
What have you tried? Do the results match the documentation? Have you read `matmul` docs yet? — hpaulj, Jan 27 '22 at 16:36
`tensordot` does an 'outer' product on the leading dimensions; `matmul` uses the broadcasting rules, and treats them as a 'batch'. `(a.reshape(2,1,4)@b.reshape(2,4,1)).squeeze()` — hpaulj, Jan 27 '22 at 19:41
Hi @hpaulj thanks for you answer, yes matmul works in this case, sorry I simplified my question too much. I've opened another post better describe it, please have a look. https://stackoverflow.com/questions/70907083/numpy-einsum-tensordot-with-shared-non-contracted-axis — Sam-gege, Jan 29 '22 at 15:54

score 1 · Accepted Answer · answered Jan 27 '22 at 17:45

You can use: np.diag(np.tensordot(a, b, axes=((1, 2), (1, 2)))) to get the result you want. However, using np.tensordot or a matrix multiplication is not a good idea in you case as they do much more work than needed. The fact that they are efficiently implemented does not balance the fact that they do much more computation than needed (only the diagonal is useful here). np.einsum('ijk,ijk->i',a,b) does not compute more things than needed in your case. You can try the optimize=True or even optimize='optimal' since the parameter optimize is set to False by default. If this is not fast enough, you can try to use NumExpr so to compute np.sum(a*b,axis=(1, 2)) more efficiently (probably in parallel). Alternatively, you can use Numba or Cython too. Both supports fast parallel loops.

Thank you for your answer, I've tried with `optimize` but it's still very slow.. I've opened another post better describes my question, please have a look, thanks! https://stackoverflow.com/questions/70907083/numpy-einsum-tensordot-with-shared-non-contracted-axis — Sam-gege, Jan 29 '22 at 15:52

NumPy tensordot grouped calculation

1 Answers1