1

Suppose I draw randomly from a normal distribution with mean zero and standard deviation represented by a vector of, say, dimension 3 with

scale_rng=np.array([1,2,3])
eps=np.random.normal(0,scale_rng)

I need to compute a weighted average based on some simulations for which I draw the above mentioned eps. The weights of this average are "the probability of eps" (hence I will have a vector with 3 weights). For weighted average I simply mean an arithmetic sum wehere each component is multiplied by a weight, i.e. a number between 0 and 1 and where all the weights should sum up to one. Such weighted average shall be calculated as follows: I have a time series of observations for one variable, x. I calculate an expanding rolling standard deviation of x (say this is the values in scale). Then, I extract a random variable eps from a normal distribution as explained above for each time-observation in x and I add it to it, say obtaining y=x+eps. Finally, I need to compute the weighted average of y where each value of y is weighted by the "probability of drawing each value of eps from a normal distribution with mean zero and standard deviation equal to scale.

Now, I know that I cannot think of this being the points on the pdf corresponding to the values randomly drawn because a normal random variable is continuous and as such the pdf at a certain point is zero. Hence, the only solution I Found out is to discretize a normal distribution with a certain number of bins and then find the probability that a value extracted with the code of above is actually drawn. How could I do this in Python?

EDIT: the solution I found is to use

norm.cdf(eps_it+0.5, loc=0, scale=scale_rng)-norm.cdf(eps_it-0.5, loc=0, scale=scale_rng)

which is not really based on the discretization but at least it seems feasible to me "probability-wise".

user9875321__
  • 195
  • 12
  • the normal distribution has a well defined probability density function, and appears in scipy as [stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html). why not use that? – Sam Mason Dec 01 '20 at 17:58
  • @SamMason the pdf does not give me the probability of a countinuous random variable being equal to a specific value but rather it being equal to an interval. Moreover, the pdf can well be above 1 in a small interval, and this makes it not suitable to produce weights for a weighted average – user9875321__ Dec 01 '20 at 18:22
  • that's not quite right, but am not sure how to best explain. can post an answer showing how to use the PDF to get a weighted average. note that drawing `eps` from a normal and then further weighting by a normal PDF might be doing unexpected things to your estimates – Sam Mason Dec 01 '20 at 18:51
  • unfortunately it is not my idea the one of using the pdf of ```eps``` to build a weighted average but someone else, and I have to reproduce such results. Unfortunately I could not find any further help online – user9875321__ Dec 01 '20 at 18:55
  • Do you know the limits for your bins? You might be looking unequal bin widths to achieve equal probabilities, or fixed bin widths (except for the lowest and highest, since the range of a normal is infinite) yielding unequal probabilities, or maybe something else entirely. I think you need to supply some more information in your question. – pjs Dec 01 '20 at 20:33
  • @SamMason continuous PDFs do not (directly) yield probabilities. Probabilities are found as the area under the PDF over some range, which is why the probability of getting any particular point is zero. Zero width => zero area => zero probability. That seems to be Matteo's motivation to go to bins, which have non-zero width. – pjs Dec 01 '20 at 20:39
  • @pjs You exactly got my point. I thought it could be a good idea to discretize so that somehow I can get "real" probabilities. Sadly, I have no other information, but this also means I have some degrees of freedom in choosing what I want to achieve. The only restriction that I'd like to respect is that, since I need to build a weighted average with weights equal to probabilities of ```eps```, I'd like them to be between 0 and 1 and to sum up to 1, exactly as probabilities do. – user9875321__ Dec 01 '20 at 20:47
  • @pjs yup, that's why I referred to the probability *density* rather than just probability. the OP's question would be greatly improved with some more details, e.g. why do they actually want probabilities rather than a density which integrates to 1. also what do they mean by "weighted average" (i.e. just a weighted arithmetic mean, or something else) and some example of what is actually being calculated – Sam Mason Dec 02 '20 at 16:58
  • I edited the question providing more details – user9875321__ Dec 02 '20 at 17:26
  • it would be much more helpful to have that explanation in a few short lines of code, but AFAICT the lines involving `w` in my answer are basically what you want to do. I can't tell from your question if you're dealing with a single vector (i.e. one `scale` for each sample) or a vector per sample. If it's just a vector then you don't need the call to `prod`. – Sam Mason Dec 03 '20 at 14:53

1 Answers1

0

here's an example leaving everything continuous.

import numpy as np
from scipy import stats

# some function we want a monte carlo estimate of
def fn(eps):
  return np.sum(np.abs(eps), axis=1)

# define distribution of eps
sd = np.array([1,2,3])
d_eps = stats.norm(0, sd)

# draw uniform samples so we don't double apply the normal density
eps = np.random.uniform(-6*sd, 6*sd, size=(10000, 3))

# calculate weights (working with log-likelihood is better for numerical stability)
w = np.prod(d_eps.pdf(eps), axis=1)
# normalise so weights sum to 1
w /= np.sum(w)

# get estimate
np.sum(fn(eps) * w)

which gives me 4.71, 4.74, 4.70 4.78 if I run it a few times. we can verify this is correct by just using a mean when eps is drawn from a normal directly:

np.mean(fn(d_eps.rvs(size=(10000, 3))))

which gives me essentially the same values, but with expected lower variance. e.g. 4.79, 4.76, 4.77, 4.82, 4.80.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • Are you using an inverse transformation given that you make use of the uniform distribution? Furthermore, what does ```d_eps.rvs()``` achieve in this case? – user9875321__ Dec 01 '20 at 19:04
  • `rvs` just draws random samples values from a distribution, in this case gaussians with 0 mean and SD's specified by `sd`. I'm sampling from a uniform so we get the same answer from the weighted average as just sampling values directly. if you really want to sample from a gaussian and then weight by a gaussian you can do that, but you'll be doing something different and I'm not sure how/when that would be useful – Sam Mason Dec 02 '20 at 17:03