0

Disclaimer: I did though about posting to math stack-exchange or some kind. But I heard from my math-major friends that they don't really use Einstein Summation a lot, but I do know machine learning uses a lot of that. Therefore I posted this problem here (as optimizing algorithm performance).

When doing research about matrix computation (e.g. how many element wise multiplications are at least needed), I was trying to compute the following gradient:

high level equation

where ABC means contract three matrices on their first axis (e.g. 2x3, 2x4, and 2x5 becomes 3x4x5 with the 2-axis summed over). Basically, it computes the gradient of norm of 3-matrix-contraction ABC with respect to A. Then computes the norm of that gradient with respect to A again.

This is equivalent to:

traditional writing

or simplify a little bit (proven by autograd):

enter image description here

We can write this in Einstein Summation form (used by einsum function of a lot of packages such as numpy, tensorflow and etc.)

np.einsum('ia,ib,ic,jb,jc,jd,je,kd,ke->ka', A, B, C, B, C, B, C, B, C)

When writing this, I find matrices B and C are kind of repeating again and again in the summation. I'm wondering can I simplify those "lots of B and Cs" to matrix power of some kind? That should make things logarithmic-ly faster. I tried manually simplifying but didn't figure out.

Thanks a lot! Please correct me if something I'm saying is not correct.

ZisIsNotZis
  • 1,570
  • 1
  • 13
  • 30
  • An optimal solution to your proposed problem is incredible difficult to figure out and something we talk about [here](https://github.com/dgasmith/opt_einsum/issues/65). In the meantime you might want the ability to automatically stash intermediates which likely will help you a great deal, see the docs [here](https://optimized-einsum.readthedocs.io/en/latest/sharing_intermediates.html). – Daniel Jan 20 '19 at 23:46

1 Answers1

0

I found I can first broadcast (i.e. outer product) B (of shape ib) and C (of shape ic) to get a BC tensor of shape ibc:

BC = np.einsum('ib,ic->ibc', B, C)

I can then tensor contract it with its transpose to get a square matrix:

JUMP = np.einsum('ibc,cbj->ij', BC, BC.T)

This matrix JUMP is square and acts like "gradient jump" operator that jumps order of gradient. For example, If I want to get the second order gradient as mentioned in the question, I can simply multiply this JUMP with the first order gradient to get that:

g2 = JUMP @ g1

To get the third order, multiply matrix square of it:

g3 = JUMP @ JUMP @ g1

To get forth order, multiply matrix forth-power (not cube) of it:

g4 = np.linalg.matrix_power(JUMP, 4) @ g1
ZisIsNotZis
  • 1,570
  • 1
  • 13
  • 30
  • the point of einsum is that you shouldn't need to order your indices before argument insertion since the specification is explicit: `np.einsum('ibc,cbj->ij', BC, BC.T)` is the same as `np.einsum('ibc,jbc->ij', BC, BC)`, fyi. – Attack68 Jan 22 '19 at 18:37