I have an odd case where I can see numpy.einsum
speeding up a computation but can't see the same in einsum_path
. I'd like to quantify/explain this possible speed-up but am missing something somewhere...
In short, I have a matrix multiplication where only the diagonal of the final product is needed.
a = np.arange(9).reshape(3,3)
print('input array')
print(a)
print('normal method')
print(np.diag(a.dot(a)))
print('einsum method')
print(np.einsum('ij,ji->i', a, a))
which produces the output:
input array
[[0 1 2]
[3 4 5]
[6 7 8]]
normal method
[ 15 54 111]
einsum method
[ 15 54 111]
When running on a large matrix, numpy.einsum
is substantially faster.
A = np.random.randn(2000, 300)
B = np.random.randn(300, 2000)
print('normal method')
%timeit np.diag(A.dot(B))
print('einsum method')
%timeit np.einsum('ij,ji->i', A, B)
which produces:
normal method
17.2 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
einsum method
1.02 ms ± 7.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
My intuition is that this speed up is possible as numpy.einsum
is able to drop computations that would eventually be dropped by taking the diagonal - but, if I'm reading it correctly, the output of numpy.einsum_path
is showing no speed up at all.
print(np.einsum_path('ij,ji->i',A,B,optimize=True)[1])
Complete contraction: ij,ji->i
Naive scaling: 2
Optimized scaling: 2
Naive FLOP count: 1.200e+06
Optimized FLOP count: 1.200e+06
Theoretical speedup: 1.000
Largest intermediate: 2.000e+03 elements
--------------------------------------------------------------------------
scaling current remaining
--------------------------------------------------------------------------
2 ji,ij->i i->i
Questions:
- Why can I see a practical speed-up that isn't reflected in the computational path?
- Is there a way to quantify the speed up by
ij,ji->i
path innumpy.einsum
?