1

I would like to know, is there any simple method to parallel einsum in Numpy? I found some discussions Numpy np.einsum array multiplication using multiple cores Any chance of making this faster? (numpy.einsum)

numpy.tensordot() only for binary contraction with a single axis, Numba needs to specify certain loops. Is there any simple and robust approach to parallel einsum (possibly including opt-einsum, tf-einsum etc) with arbitrary contractions?

A sample code is as following (if necessary I can use more complicated contraction as the example)

import numpy as np
import timeit
import time

na = nc = 1000
nb = 1000
n_iter = 10

A = np.random.random((na,nb))
B = np.random.random((nb,nc))


t_total = 0.
for i in range(n_iter):
    start = time.time()
    C = np.einsum('ij,jk->ik', A, B)
    end = time.time()
    t_total += end - start
print('AB->C',(t_total)/n_iter)
Geositta
  • 81
  • 7
  • `tensordot` is just a frontend to `dot`. – hpaulj Jun 29 '22 at 05:00
  • By now `einsum` has become a complex tool. It some cases it just uses `matmul` with all of its 'BLAS' power. Other cases are more like indexing without any matrix multiplication. – hpaulj Jun 29 '22 at 05:04
  • 1
    In you case you can use the option `optimize='optimal'`. You can check if there is a BLAS call (tensordot call) with `einsum_pathinfo = np.einsum_path('ij,jk->ik', A, B,optimize='optimal',einsum_call=True)`. If you do an operation in a loop it also makes sense to calculate the optimal path only once. Of course this depends on the contraction. It is often not beneficial to copy the inputs to the right shape to call tensordot. Another thing are nested tensordot calls within the contraction which are also not supported. – max9111 Jun 29 '22 at 07:32
  • Make it simple: just use `A @ B`. It is simpler, shorter and faster (even than `optimize='optimal'`). If this takes more than dozens of milliseconds, consider using a better BLAS implementation or checking its configuration. – Jérôme Richard Jun 29 '22 at 19:21

0 Answers0