I have to assign values to a 3-dimensional array. After some benchmarking, I found that assignment of the value is much slower than computing the value to assign.
Which means
for i in range(n):
for j in range(g):
for k in range(t):
Mu_view[i,j,k] = exp(Xbeta_view[i,j]+Gamma[j,k]) * S[i]
is slower than
for i in range(n):
for j in range(g):
for k in range(t):
exp(Xbeta_view[i,j]+Gamma[j,k]) * S[i]
The benchmark : order of magnitude difference in runtime
# no assignment
25.5 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# with assignment
1.7 s ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The full code of the function
@cython.boundscheck(False)
@cython.wraparound(False)
def get_Mu(double[::1,:] N,
double[::1,:] X,
double[::1,:] Beta,
double[::1,:] Gamma,
double[::1] S):
cdef int i, j, k
cdef int n, g, p, t
n, g, p, t = N.shape[0], Beta.shape[0], Beta.shape[1], Gamma.shape[1]
cdef double x
# declare Mu : shape= n * g * t and its memoryview
cdef np.ndarray[double, ndim=3, mode='fortran'] Mu = np.empty(shape=(n,g,t), dtype=DTYPE, order='F')
cdef double[::1,:,:] Mu_view = Mu
# obtain XBeta and compute Mu on mem-view
cdef np.ndarray[double, ndim=2] Xbeta = matmul_A_B_t(X, Beta)
cdef double[::1,:] Xbeta_view = Xbeta
for i in range(n):
for j in range(g):
for k in range(t):
Mu_view[i,j,k] = exp(Xbeta_view[i,j]+Gamma[j,k]) * S[i]
return Mu
I think that each time I call Mu_view[i,j,k]
, it is creating a new slice that creates a overhead but I can't come up with a solution.
ADD: Input variables
n, g, p, t = 10000, 500, 10, 20
N = np.ones(shape=(n,g), order='F', dtype=float)
X = np.ones(shape=(n,p), order='F', dtype=float)
Beta = np.ones(shape=(g,p), order='F', dtype=float)
Gamma = np.ones(shape=(g,t), order='F', dtype=float)
S = np.ones(n, dtype=float)