I want to calculate the mean square displacement for several particles, defined as:
where i
is the index for the particle, Dt
is the time interval, t
is the time, and vec(x)
is the position of the particles in two dimensions. We do an average for all possible times t
.
I have managed to implement it with numpy. Note that pos
is a np.array
with three axis: (particles, time, coordinate)
.
import numpy as np
import matplotlib.pyplot as plt
import time
#Initialize data
np.random.seed(1)
nTime = 10**4
nParticles = 3
pos = np.zeros((nParticles, nTime, 2)) #Axis: particles, times, coordinates
for t in range(1, nTime):
pos[:, t, :] = pos[:, t-1, :] + ( np.random.random((nParticles, 2)) - 0.5)
#MSD calculation
def MSD_direct(pos):
Dt_r = np.arange(1, pos.shape[1]-1)
MSD = np.empty((nParticles, len(Dt_r)))
dMSD = np.empty((nParticles,len(Dt_r)))
for k, Dt in enumerate(Dt_r):
SD = np.sum((pos[:, Dt:,:] - pos[:, 0:-Dt,:])**2, axis = -1)
MSD[:,k] = np.mean( SD , axis = 1)
dMSD[:,k] = np.std( SD, axis = 1 ) / np.sqrt(SD.shape[1])
return Dt_r, MSD, dMSD
start_time = time.time()
Dt_r, MSD_d, dMSD_d = MSD_direct(pos)
print("MSD_direct -- Time: %s s\n" % (time.time() - start_time))
#Plots
plt.figure()
for i in range(nParticles):
plt.plot(pos[i,:,0])
plt.xlabel('t')
plt.ylabel('x')
plt.savefig('pos_x.png', dpi = 300)
plt.figure()
for i in range(nParticles):
plt.plot(pos[i,:,1])
plt.xlabel('t')
plt.ylabel('y')
plt.savefig('pos_y.png', dpi = 300)
plt.figure()
for i in range(nParticles):
plt.fill_between(Dt_r, MSD_d[i,:]+dMSD_d[i,:], MSD_d[i,:] - dMSD_d[i,:], alpha = 0.5)
plt.plot(Dt_r, MSD_d[i,:])
plt.xlabel('Dt')
plt.ylabel('MSD')
plt.savefig('MSD.png', dpi = 300)
Output:
MSD_direct -- Time: 7.793087720870972 s
However, I would like to optimize this code if possible. There is still a loop for Dt
, I do not know how could I remove it and vectorize the program fully using numpy.
I also rewrote the calculation using numba, managing around a factor two of improvement from the previous code. I wonder if it is still possible to further improve it.
import numba as nb
@nb.jit(fastmath=True,parallel=True)
def MSD_numba(pos):
Dt_r = np.arange(1, pos.shape[1]-1)
MSD = np.empty((nParticles, len(Dt_r)))
dMSD = np.empty((nParticles,len(Dt_r)))
for i in nb.prange(nParticles):
for Dt in Dt_r:
SD = (pos[i, Dt:, 0] - pos[i, 0:-Dt, 0])**2 + (pos[i, Dt:, 1] - pos[i, 0:-Dt, 1])**2
MSD[i, Dt-1] = np.mean( SD )
dMSD[i, Dt-1] = np.std( SD ) / np.sqrt(len(SD))
return Dt_r, MSD, dMSD
start_time = time.time()
Dt_r, MSD_n, dMSD_n = MSD_numba(pos)
print("MSD_numba -- Time: %s s" % (time.time() - start_time))
print("MSD_numba -- All close to MSD_direct: %r\n" %(np.allclose(MSD_n, MSD_d) ) )
Output:
MSD_numba -- Time: 4.520232915878296 s
MSD_numba -- All close to MSD_direct: True
Note: this type of question has been asked in several posts already, but they use different definitions (Mean square displacement python, Mean squared displacement, Mean square displacement for n-dim matrix python), they do not have an answer (Mean square displacement in Python), they just use one particle (Computing mean square displacement using python and FFT, Mean square displacement of a 1d random walk in python), they use pandas (Vectorized calculation of Mean Square Displacement in Python, Speedup MSD calculation in Python), etc.