I would like to know, is there any simple method to parallel einsum
in Numpy
?
I found some discussions
Numpy np.einsum array multiplication using multiple cores
Any chance of making this faster? (numpy.einsum)
numpy.tensordot()
only for binary contraction with a single axis, Numba
needs to specify certain loops. Is there any simple and robust approach to parallel einsum
(possibly including opt-einsum
, tf-einsum
etc) with arbitrary contractions?
A sample code is as following (if necessary I can use more complicated contraction as the example)
import numpy as np
import timeit
import time
na = nc = 1000
nb = 1000
n_iter = 10
A = np.random.random((na,nb))
B = np.random.random((nb,nc))
t_total = 0.
for i in range(n_iter):
start = time.time()
C = np.einsum('ij,jk->ik', A, B)
end = time.time()
t_total += end - start
print('AB->C',(t_total)/n_iter)