I have some code that uses the following einsum
:
y = np.einsum('wxyijk,ijkd->wxyd', x, f)
where (for example) the shape of x is (64, 26, 26, 3, 3, 3) and the shape of f is (3, 3, 3, 1), both having dtype=float
%timeit np.einsum('wxyijk,ijkd->wxyd', x, f)
# 2.01 ms ± 55.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This is too slow for my application, which is time critical. Neither using the GPU (via CuPy) nor path speedups (via opt-einsum) seems to make this any faster. Is there any way to make it faster natively in NumPy, or is this just about as fast as it's going to get?