Speed difference in np.einsum

Question

I noticed that np.einsum is faster when it reduces one dimension

import numpy as np
a = np.random.random((100,100,100))
b = np.random.random((100,100,100))

%timeit np.einsum('ijk,ijk->ijk',a,b)
# 100 loops, best of 3: 3.83 ms per loop
%timeit np.einsum('ijk,ijk->ij',a,b)
# 1000 loops, best of 3: 937 µs per loop
%timeit np.einsum('ijk,ijk->i',a,b)
# 1000 loops, best of 3: 921 µs per loop
%timeit np.einsum('ijk,ijk->',a,b)
# 1000 loops, best of 3: 928 µs per loop

Which seems very weird to me, as I would expect it to first generate the new array and then sum over it, which is obviously not happening. What is going on there? Why is it getting faster, when one drops one dimnesion, but does not get faster after other dimension drops?

Side note: I first thought that it has to do with creating a large array, when it has many dimensions, which I don't think is the case:

 %timeit np.ones(a.shape)
 # 1000 loops, best of 3: 1.79 ms per loop
 %timeit np.empty(a.shape)
 # 100000 loops, best of 3: 3.05 µs per loop

As creating new arrays is way faster.

hpaulj · Answer 1 · 2017-08-16T17:25:28.613

einsum is implemented in compiled code, numpy/numpy/core/src/multiarray/einsum.c.src.

The core operation is to iterate over all dimensions (e.g. in your case 100*100*100 times) using the c version of nditer, applying the sum-of-products calculation defined by the ijk string.

But it does various optimizations, including generating views if no multiplication is required. So it will require careful study to see what's different in your cases.

The time divide is between producing a 3d output without summations, and one that sums on one or more axis.

Speed difference in np.einsum

1 Answers1