2

I have read through the einsum manual and ajcr's basic introduction

I have zero experience with einstein summation in a non-coding context, although I have tried to remedy that with some internet research (would provide links but don't have the reputation for more than two yet). I've also tried experimenting in python with einsum to see if I could get a better handle on things.

And yet I'm still unclear on whether it is both possible and efficient to do as follows:

on two arrays of arrays (a and b) of equal length (3) and height (n) , row by row produce the outer product of ( row i: a on b) plus the outer product of (row i: b on a), and then sum all the outer product matrices to output one, final matrix.

I know that 'i,j->ij' produces the outer product of one vector on another-- it's the next steps that have lost me. ('ijk,jik->ij' is definitely not it)

my other available option is to loop through the array and call the basic functions (a double outer product, and a matrix addition) from functions I've written in cython (using the numpy built in outer and sum function is not an option, it is far too slow). It is likely I'd end up moving the loop itself to cython as well.

so:

  1. how can I express einsum-ically the procedure I described above?

  2. would it offer real gains over doing everything in cython? or are there other alternatives I'm not aware of? (including the possibility that I've been using numpy less efficiently than I could be...)

Thanks.


edit with example:

A=np.zeros((3,3))
arrays_1=np.array([[1,0,0],[1,2,3],[0,1,0],[3,2,1]])
arrays_2=np.array([[1,2,3],[0,1,0],[1,0,0],[3,2,1]])
for i in range(len(arrays_1)):
  A=A+(np.outer(arrays_1[i], arrays_2[i])+np.outer(arrays_2[i],arrays_1[i]))

(note, however, that in practice we're dealing with arrays of much greater length (ie still length 3 for each internal member but up to a few thousand such members), and this section of code gets (unavoidably) called many times)

in case it's at all helpful, here's the cython for the summing two outer products:

def outer_product_sum(np.ndarray[DTYPE_t, ndim=1] a_in, np.ndarray[DTYPE_t, ndim=1] b_in):
    cdef double *a = <double *>a_in.data
    cdef double *b = <double *>b_in.data
    return np.array([
[a[0]*b[0]+a[0]*b[0], a[0]*b[1]+a[1]*b[0], a[0] * b[2]+a[2] * b[0]],
[a[1]*b[0]+a[0]*b[1], a[1]*b[1]+a[1]*b[1], a[1] * b[2]+a[2] * b[1]],
[a[2]*b[0]+a[0]*b[2], a[2]*b[1]+a[1]*b[2], a[2] * b[2]+a[2] * b[2]]])

which, right now, I call from within a 'i in range(len(array))' loop as shown above.

ali_m
  • 71,714
  • 23
  • 223
  • 298
dWitty
  • 494
  • 9
  • 22
  • Could you show how you would perform your sum-product using a `for` loop (preferably with some representative fake inputs)? – ali_m Feb 22 '16 at 10:18

2 Answers2

3

Einstein summation can only be used for the multiplicative part of question (i.e. the outer product). Luckily the summation does not have to be performed element-wise, but you can do that on the reduce matrices. Using the arrays from your example:

arrays_1 = np.array([[1,0,0],[1,2,3],[0,1,0],[3,2,1]])
arrays_2 = np.array([[1,2,3],[0,1,0],[1,0,0],[3,2,1]])
A = np.einsum('ki,kj->ij', arrays_1, arrays_2) + np.einsum('ki,kj->ij', arrays_2, arrays_1)

The input arrays are of shape (4,3), summation takes place over the first index (named 'k'). If summation should take place over the second index, change the subscripts string to 'ik,jk->ij'.

Rob
  • 3,418
  • 1
  • 19
  • 27
  • thank you! I still need to run more tests on speediness, but this definitely beat the loop with calls to cython, so I guess that also answers my efficiency question. – dWitty Feb 22 '16 at 11:20
1

Whatever you can do with np.einsum, you can usually do faster using np.dot. In this case, A is the sum of two dot products:

arrays_1 = np.array([[1,0,0],[1,2,3],[0,1,0],[3,2,1]])
arrays_2 = np.array([[1,2,3],[0,1,0],[1,0,0],[3,2,1]])

A1 = (np.einsum('ki,kj->ij', arrays_1, arrays_2) +
      np.einsum('ki,kj->ij', arrays_2, arrays_1))

A2 = arrays_1.T.dot(arrays_2) + arrays_2.T.dot(arrays_1)

print(np.allclose(A1, A2))
# True

%timeit (np.einsum('ki,kj->ij', arrays_1, arrays_2) +
         np.einsum('ki,kj->ij', arrays_2, arrays_1))
# 100000 loops, best of 3: 7.51 µs per loop

%timeit arrays_1.T.dot(arrays_2) + arrays_2.T.dot(arrays_1)
# 100000 loops, best of 3: 4.51 µs per loop
ali_m
  • 71,714
  • 23
  • 223
  • 298
  • in my actual code, calling np.dot 347,766 times costs 1230 ms vs einsum's 1982ms. the [numpy documentation](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.dot.html) said dot does inner product for two vectors, but I guess I either misunderstood or am reading the wrong version. thanks! – dWitty Feb 22 '16 at 12:31
  • ..if dot on an array of arrays treats them as 2D and applies matrices, is there a way to make it instead do collapsible inner products? (the equivalent of einsum('ij,ij', arrays1, arrays2) -- based on the suggestion that most einsum applications can be dots instead – dWitty Feb 22 '16 at 12:48
  • .dot on two 2d arrays is treated as matrix multiplication, which is the same as 'ik, kj -> ij' in the einstein summation convention. Transpose the first matrix and you see that both einsum and dot give the same answer. – Rob Feb 22 '16 at 12:50
  • @dWitty *"**For 2-D arrays it is equivalent to matrix multiplication**, and for 1-D arrays to inner product of vectors"*. If you wanted to do the equivalent of `einsum('ij,ij', arrays1, arrays2)` then you could treat each 2D array as a 1D vector, e.g. `np.dot(arrays_1.flat, arrays_2.flat)` (you could also use `np.inner` here in place of `np.dot`, or use `np.tensordot(arrays_1, arrays_2)`). However that will do a single sum-product that collapses over *all* axes, yielding a scalar (17) instead of a 3x3 array as in your question. – ali_m Feb 22 '16 at 13:46
  • @ali_m-- indeed, and it was, really, a separate question (that maybe i should move out of the comments here). I have a variety of matrix operations in my code, and was already using einsum for the inner product elsewhere-- a quick check suggests that neither dot nor inner provide real improvements in performance... (they both perform slightly slower) – dWitty Feb 22 '16 at 14:11
  • For one thing, `np.dot` and `np.inner` usually call routines from an external BLAS library, so you should make sure that your version of numpy is linked against a fast BLAS implementation (e.g. MKL or OpenBLAS). Beyond that, you should probably ask a separate question with more details of the code you are trying to optimize (the comments aren't really the right place to try to answer this). – ali_m Feb 22 '16 at 14:30