How does `numpy.einsum` work?

Question

The correct way of writing a summation in terms of Einstein summation is a puzzle to me, so I want to try it in my code. I have succeeded in a few cases but mostly with trial and error.

Now there is a case that I cannot figure out. First, a basic question. For two matrices A and B that are Nx1 and 1xN, respectively, AB is NxN but BA is 1x1. When I want to calculate the NxN case with np.einsum I can do:

import numpy as np

a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ji->ij', a, b)

and the final array is 2x2. However

a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ij->ij', a, b)

returns a 1x2 array. I don't quite understand why this does not give the correct result. For example for the above case numpy's guide says that arrows can be used to force summation or stop it from taking place. But that seems quite vague to me; in the above case I don't understand how numpy decides about the final size of the output array based on the order of indices (which apparently changes).

Formally I know the following: When there is nothing on the right side of the arrow, one can write the summation mathematically as $\sum\limits_{i=0}^{N}\sum\limits_{j=0}^{M} A_{ij}B_{ij}$ for np.einsum('ij,ij',A,B), but when there is an arrow I am clueless how to interpret it in terms of a formal mathematical expression.

Are you sure you pasted the code you intended to? The result of `np.einsum('ij,ji->ij',a,b)` should be 2x2, not 2x2x2. — DSM, Sep 26 '14 at 16:58
Oops! Sorry the first one is 1x2 the second 2x2. My mistake. — Cupitor, Sep 26 '14 at 16:59
http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html — Avinash Babu, Sep 26 '14 at 17:06
@AvinashBabu, thank you.... I already said that I find the guide unclear. It doesn't say how the forcing takes place when one puts the index on the right side of arrow or how the sizes should change when the order of the indices change on the left side of the arrow... — Cupitor, Sep 26 '14 at 17:12

score 13 · Accepted Answer · edited May 23 '17 at 12:16

In [22]: a
Out[22]: array([[1, 2]])
In [23]: b
Out[23]: array([[2, 3]])
In [24]: np.einsum('ij,ij->ij',a,b)
Out[24]: array([[2, 6]])
In [29]: a*b
Out[29]: array([[2, 6]])

Here the repetition of the indices in all parts, including output, is interpreted as element by element multiplication. Nothing is summed. a[i,j]*b[i,j] = c[i,j] for all i,j.

In [25]: np.einsum('ij,ji->ij',a,b)
Out[25]: 
array([[2, 4],
       [3, 6]])
In [28]: np.dot(a.T,b).T
Out[28]: 
array([[2, 4],
       [3, 6]])
In [38]: np.outer(a,b)
Out[38]: 
array([[2, 3],
       [4, 6]])

Again no summation because the same indices appear on left and right sides. a[i,j]*b[j,i] = c[i,j], in other words:

[[1*2, 2*2],
 [1*3, 2*3]]

In effect an outer product. A look at how a is broadcasted against b.T might help:

In [69]: np.broadcast_arrays(a,b.T)
Out[69]: 
[array([[1, 2],
        [1, 2]]), 
 array([[2, 2],
        [3, 3]])]

On the left side of the statement, repeated indices indicate which dimensions are multiplied. Matching left and right sides determines whether they are summed or not.

np.einsum('ij,ji->j',a,b) # array([ 5, 10]) sum on i only
np.einsum('ij,ji->i',a,b) # array([ 5, 10]) sum on j only
np.einsum('ij,ji',a,b) # 15 sum on i and j

A while back I worked out a pure Python equivalent to einsum, with most of focus on how it parsed the string. The goal is the create an nditer with which it does a sum of products calculation. But it's not a trivial script to follow, even in Python:

https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py

A simpler sequence showing these summation rules:

In [53]: c=np.array([[1,2],[3,4]])
In [55]: np.einsum('ij',c)
Out[55]: 
array([[1, 2],
       [3, 4]])
In [56]: np.einsum('ij->i',c)
Out[56]: array([3, 7])
In [57]: np.einsum('ij->j',c)
Out[57]: array([4, 6])
In [58]: np.einsum('ij->',c)
Out[58]: 10

Using arrays that don't have a 1 dimension removes the broadcasting complication:

In [71]: b2=np.arange(1,7).reshape(2,3)
In [72]: np.einsum('ij,ji',a2,b2)
...
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,3)->(2,3) (2,3)->(3,2)

Or should I say, it exposes the attempted broadcasting.

Ellipsis adds a level of complexity to the einsum interpretation. I developed the above mentioned github code when I solved a bug in the uses of .... But I didn't put much effort into refining the documentation.

Ellipsis broadcasting in numpy.einsum

The ellipses are most useful when you want an expression that can handle various sizes of arrays. If your arrays always 2D, it doesn't do anything extra.

By way of example, consider a generalization of the dot, one that multiplies the last dimension of A with the 2nd to the last of B. With ellipsis we can write an expression that can handle a mix of 2d, 3D and larger arrays:

np.einsum('...ij,...jk',np.ones((2,3)),np.ones((3,4)))  # (2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((3,4)))  # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((5,3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((7,5,3,4)))  # (7,5,2,4)
np.einsum('...ij,...jk->...ik',np.ones((5,2,3)),np.ones((7,5,3,4)) # (7, 5, 2, 4)

The last expression uses the default right hand side indexing ...ik, ellipsis plus the non-summing indices.

Your original example could be written as

np.einsum('...j,j...->...j',a,b)

Effectively it fills in the i (or more dimensions) to match the dimensions of the arrays.

which would also work if a or b was 1d:

np.einsum('...j,j...->...j',a,b[0,:])

np.dot way of generalizing to larger dimensions is different

dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

is expressed in einsum as:

np.einsum('ijo,kom->ijkm',np.ones((2,3,4)),np.ones((3,4,2)))

which can be generalized with

np.einsum('...o,kom->...km',np.ones((4,)),np.ones((3,4,2)))

or

np.einsum('ijo,...om->ij...m',np.ones((2,3,4)),np.ones((3,4,2)))

But I don't think I can completely replicate it in einsum. That is, I can't tell it to fill in indices for A, followed by different ones for B.

Thank you for this detailed reply, vote up. But you see... `b[j,i]` is meaningless since when `j=1` and `i=0` `b` is not defined. — Cupitor, Sep 26 '14 at 17:54
With `'ij,ji'`, `i` is the 1st dimension of both inputs, `j` the 2nd. `ji` means, effect use the transpose of `b`. `einsum` also uses broadcasting. Look at:`np.broadcast_arrays(a,b.T)`. — hpaulj, Sep 26 '14 at 18:12
But then how come in `'ij,jk'` for `np.einsum('ij,jk',A,B)`, `j` is interpreted as the column in the first but as the row in second? I think broadcasting is not relevant here because for that in `einsum` you really need to specify what you are broadcasting with respect to with `...`. — Cupitor, Sep 26 '14 at 18:31
If your inputs were (2,3) you'd get a broadcasting error. The documentation isn't quite right about the function of '...'. It's more of a placeholder for unspecified dimensions than a broadcasting enabler. — hpaulj, Sep 26 '14 at 21:01
`ij` are the row and col indices of `A`. `jk` are the row and col indices of `B`. So the cols of `A` multiply the rows of `B`. This is the `np.dot` product. The result is `(A.shape[0], B.shape[1])` shaped, i.e. `ik`. `np.einsum('ij,jk->ijk',A,B).sum(axis=1)` produces the same result. — hpaulj, Sep 26 '14 at 23:17
Oh. I see. Thank you! Is there any possibility that you could guide me to an elaboration of `...`? That how exactly it works? — Cupitor, Sep 29 '14 at 10:23
I've added some examples of using `...`. It's easier to demonstrate than explain in a few simple rules. — hpaulj, Sep 30 '14 at 19:42

How does `numpy.einsum` work?

1 Answers1