Related question BLAS with symmetry in higher order tensor in Fortran
I tried to use python code to exploit the symmetry in tensor contraction, A[a,b] B[b,c,d] = C[a,c,d]
when B[b,c,d] = B[b,d,c]
hence C[a,c,d] = C[a,d,c]
. (Einstein summation convention assumed, i.e., repeated b
means summation over it)
By the following code
import numpy as np
import time
# A[a,b] * B[b,c,d] = C[a,c,d]
na = nb = nc = nd = 100
A = np.random.random((na,nb))
B = np.random.random((nb,nc,nd))
C = np.zeros((na,nc,nd))
C2= np.zeros((na,nc,nd))
C3= np.zeros((na,nc,nd))
# symmetrize B
for c in range(nc):
for d in range(c):
B[:,c,d] = B[:,d,c]
start_time = time.time()
C2 = np.einsum('ab,bcd->acd', A, B)
finish_time = time.time()
print('time einsum', finish_time - start_time )
start_time = time.time()
for c in range(nc):
# c+1 is needed, since range(0) will be skipped
for d in range(c+1):
#C3[:,c,d] = np.einsum('ab,b->a', A[:,:],B[:,c,d] )
C3[:,c,d] = np.matmul(A[:,:],B[:,c,d] )
for c in range(nc):
for d in range(c+1,nd):
C3[:,c,d] = C3[:,d,c]
finish_time = time.time()
print( 'time partial einsum', finish_time - start_time )
for a in range(int(na/10)):
for c in range(int(nc/10)):
for d in range(int(nd/10)):
if abs((C3-C2)[a,c,d])> 1.0e-12:
print('warning', a,c,d, (C3-C2)[a,c,d])
it seems to me that np.matmul
is faster than np.einsum
, e.g., by using np.matmul
, I got
time einsum 0.07406115531921387
time partial einsum 0.0553278923034668
by using np.einsum
, I got
time einsum 0.0751657485961914
time partial einsum 0.11624622344970703
Is the above performance difference general? I often took einsum
for granted.