I would like to do something that is likely very simple, but is giving me difficulty. Trying to draw N
samples from a multivariate normal distribution and calculate the probability of each of those randomly drawn samples. Here I attempt to use scipy
, but am open to using np.random.multivariate_normal
as well. Whichever is easiest.
>>> import numpy as np
>>> from scipy.stats import multivariate_normal
>>> num_samples = 10
>>> num_features = 6
>>> std = np.random.rand(num_features)
# define distribution
>>> mvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42)
# draw samples
>>> sample = mvn.rvs(size = num_samples); sample
# determine probability of each drawn sample
>>> prob = mvn.pdf(x = sample)
# print samples
>>> print(sample)
[[ 0.04816243 -0.00740458 -0.00740406 0.04967142 -0.01382643 0.06476885]
...
[-0.00977815 0.01047547 0.03084945 0.10309995 0.09312801 -0.08392175]]
# print probability all samples
[26861.56848337 17002.29353025 2182.26793265 3749.65049331
42004.63147989 3700.70037411 5569.30332186 16103.44975393
14760.64667235 19148.40325233]
This is confusing for me for a number of reasons:
- For the
rvs
sampling function: I don't use the keyword argumentsmean
andcov
per the docs because it seems odd to define a distribution with amean
andcov
inmvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42)
and then repeat that definition in thervs
call. Am I missing something? - For the
mvn.pdf
call, the probability density is obviously >>>1 which isn't impossible for a continuous multivariate normal, but I would like to convert these numbers to approximate probabilities at that particular point. How can I do this?
Thanks!