I have a dataset of 60,000 examples of the form:
mu1 mu2 std1 std2
0 -0.745 0.729 0.0127 0.0149
1 -0.711 0.332 0.1240 0.0433
...
They are essentially parameters of 2-dimensional normal distributions. What I want to do is create a (NxN) matrix P such that P_ij = Normal( mu_i | mean=mu_j, cov=diagonal(std_j)), where mu_i is (mu1, mu2) for data 'i'.
I can do this with the following code for example:
from scipy import stats
import numpy as np
mu_all = data[['mu1', 'mu2']]
std_all = data[['std1', 'std2']]
P = []
for i in range(len(data)):
mu_i = mu_all[i,:]
std_i = std_all[i,:]
prob_i = stats.multivariate_normal.pdf(mu_all, mean=mu_i, cov=np.diag(std_i))
P.append(prob_i)
P = np.array(P).T
But this is too expensive (my machine freezes). How can I do this more efficiently? My guess is that scipy cannot handle computing pdf of 60000 at once. Is there an alternative?