Simplifying double einsum

Question

I'm trying to use numpy.einsum to simplify a loop I have in my code.

Currently, my code looks something like this:

k = 100 
m = 50
n = 10
A = np.arange(k*m*n).reshape(k, m, n)
B = np.arange(m*m).reshape(m, m)

T = np.zeros((n, n)) 
for ind in xrange(k):
    T += np.dot(A[ind,:,:].T, np.dot(B, A[ind,:,:]))

I'm trying to use numpy.einsum as an alternative to this loop:

Tp = np.einsum('nij,njk->ik', np.einsum('nij,jk->nik', A.transpose(0,2,1), B), A)
print np.allclose(T, Tp)

Is it possible to use a single numpy.einsum instead of two?

on my computer, `print (np.allclose(T, Tp))` is false ??. – B. M. Oct 29 '15 at 10:56 — B. M., Oct 29 '15 at 10:56

Saullo G. P. Castro · Answer 1 · 2015-10-28T20:25:58.503

On my PC your timing would be:

np.einsum('nij,njk->ik', np.einsum('nij,jk->nik', A.transpose(0,2,1), B), A)
# 100 loops, best of 3: 4.55 ms per loop

You can achieve that with:

T2 = np.einsum('nij, il, kln ->jk', A, B, A.T)
#  10 loops, best of 3: 51.9 ms per loop

or using a double np.tensordot():

T3 = np.tensordot(A, np.tensordot(A, B, axes=(1, 1)), axes=((0, 1), (0, 2)))
# 100 loops, best of 3: 2.73 ms per loop

My conclusion is that you are getting a better performance doing this operation in two steps. This is probably due to the bigger strides that happen when the operation is performed at once, possibly causing more cache losses.

score 3 · Accepted Answer · edited May 23 '17 at 11:59

Really smart approaches you got in the other answer. I would like to add numpy.dot based approach into the mix that also uses some reshaping. Here's one way to do so -

k,m,n = A.shape
((A.transpose(2,0,1).reshape(-1,m).dot(B.T)).reshape(n,-1)).dot(A.reshape(-1,n)).T

Runtime tests -

This section compares all the approaches listed in the other answer and numpy.dot based one listed earlier in this post.

In [130]: k = 100 
     ...: m = 50
     ...: n = 10
     ...: A = np.arange(k*m*n).reshape(k, m, n)
     ...: B = np.arange(m*m).reshape(m, m)
     ...: 

In [131]: %timeit np.einsum('nij,njk->ik', np.einsum('nij,jk->nik', A.transpose(0,2,1), B), A)
100 loops, best of 3: 10.7 ms per loop

In [132]: %timeit np.einsum('nij, il, kln ->jk', A, B, A.T)
10 loops, best of 3: 105 ms per loop

In [133]: %timeit np.tensordot(A, np.tensordot(A, B, axes=(1, 1)), axes=((0, 1), (0, 2)))
100 loops, best of 3: 6.22 ms per loop

In [134]: %timeit ((A.transpose(2,0,1).reshape(-1,m).dot(B.T)).reshape(n,-1)).dot(A.reshape(-1,n)).T
100 loops, best of 3: 5.3 ms per loop

Internally, I believe, `tensordot` uses same sort of `transpose` and `reshape` as you do. Timings are close enough. — hpaulj, Oct 28 '15 at 22:55
@hpaulj Hmm it might. From my benchmarks, I always found `np.dot` to be tad-faster than `np.tensordot`, so that might be the function call overhead. In this case, there are numerous ways to reshape and transpose and after many trials I found out the listed one to be the most optimized. Also, `np.dot` gives me more control, so I thought this could be tried here :) — Divakar, Oct 28 '15 at 23:01

Simplifying double einsum

2 Answers2