What is the best way to do array operations when there are some repeated indices which are summed over AND others which are not? It seems like I may have to use einsum
for these operations, but it would be better if there was a tensordot
alternative with a flag for dimensions that are aligned but not summed.
Does anybody know of a fast numerical routine (in lapack perhaps?) that behaves like tensordot, except that some axes can be aligned without being summed?
==
Here is an example code to show type of array operation that is needed. The operation I need is done by method_sum
, method_einsum
, and method_matmul
. A similar operation sums over the matching j-axis and is done by method2_einsum
and method2_tensordot
.
By comparing times it seems that tensordot
should be able to beat einsum
on the first problem. However, it does not have the functionality to align axes without summing them up.
#import scipy
import scipy as sp
# Shapes of arrays
I = 200
J = 50
K = 200
L = 100
a = sp.ones((I, J, L))
b = sp.ones((J, K, L))
# The desired product has a sum over the l-axis
## Use broadcasting to multiply and sum over the last dimension
def method_sum(a, b):
"Multiply arrays and sum over last dimension."
c = (a[:, :, None, :] * b[None, :, :, :]).sum(-1)
return c
## Use einsum to multiply arrays and sum over the l-axis
def method_einsum(a, b):
"Multiply arrays and sum over last dimension."
c = sp.einsum('ijl,jkl->ijk', a, b)
return c
## Use matmul to multiply arrays and sum over one of the axes
def method_matmul(a, b):
"Multiply arrays using the new matmul operation."
c = sp.matmul(a[:, :, None, None, :],
b[None, :, :, :, None])[:, :, :, 0, 0]
return c
# Compare einsum vs tensordot on summation over j and l
## Einsum takes about the same amount of time when j is not summed over)
def method2_einsum(a, b):
"Multiply arrays and sum over last dimension."
c = sp.einsum('ijl,jkl->ik', a, b)
return c
## Tensor dot can do this faster but it always sums over the aligned axes
def method2_tensordot(a, b):
"Multiply and sum over all overlapping dimensions."
c = sp.tensordot(a, b, axes=[(1, 2,), (0, 2,)])
return c
Here are some times for the various routines on my computer. Tensordot can beat einsum for method2
because it uses multiple cores. I would like to achieve performance like tensordot
for the calculation where both the J and L-axes are aligned, but only the L-axis is summed.
Time for method_sum:
1 loops, best of 3: 744 ms per loop
Time for method_einsum:
10 loops, best of 3: 95.1 ms per loop
Time for method_matmul:
10 loops, best of 3: 93.8 ms per loop
Time for method2_einsum:
10 loops, best of 3: 90.4 ms per loop
Time for method2_tensordot:
100 loops, best of 3: 10.9 ms per loop