In my current project, I wish to vectorize the calculation of the exponent of a multivariate normal Gaussian exponent:
f(X) = exp( - x.T * cov * x)
This is easy to implement in Python if x
is just a D-dimensional vector. In my project, however, I wish to repeat this operation N times. Consequently, x
is not a D-vector but a N-by-D matrix.
I know of two ways to calculate this:
- Case I: Create a for loop and solve the matrix operations one at a time
- Case II: Implement this as a (vectorized) single operation using the N-by-D matrix instead of the D-vector.
The second approach is actually faster for moderate N, but computes a lot of unnecessary information: it returns a full N-by-N matrix, but the results we seek are only the ones along its diagonal.
Is there a way to simplify the vectorized calculations (case II), computing and returning only the diagonal terms?
import numpy as np
import scipy.stats
import time
# Number of samples
N = 1000
# Draw samples
X = scipy.stats.norm.rvs(size = (N,2))
# Define a covariance matrix
cov = np.asarray([[4.,3.],[3.,9.]])
# Case I: Solve this with a for loop
start = time.time()
z = np.zeros(N)
for n in range(N):
x = X[n,:]
z[n] = np.exp(-np.linalg.multi_dot((
x[np.newaxis,:],
np.linalg.inv(cov),
x[:,np.newaxis]))[0,0])
print('For loop operation took '+str(time.time()-start)+' seconds.')
# For loop operation took 0.017468929290771484 seconds.
# Case II: A single, vectorized calculation with unnecessary operations
start = time.time()
z2 = np.exp(-np.linalg.multi_dot((X,np.linalg.inv(cov),X.T)))
# We only need the diagonal of this
z2 = np.diag(z2)
print('Vectorized operation took '+str(time.time()-start)+' seconds.')
# Vectorized operation took 0.011204957962036133 seconds.