Could someone expain, why numpy.einsum('ij,ji', A, B) is much slower than numpy.einsum('ij,ij', A, B), as it is shown below?
In [1]: import numpy as np
In [2]: a = np.random.rand(1000,1000)
In [3]: b = np.random.rand(1000,1000)
In [4]: timeit np.einsum('ij,ij', a, b)
532 µs ± 5.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: timeit np.einsum('ij,ji', a, b)
1.28 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Regards, Marek