1

I have a list of about 100 000 probabilities on an event stored in a vector.

I want to know if it is possible to calculate the probability of n occuring events (e.g. what is the probability that exactly 1000 events occur).

I managed to calculate several probabilities in R :

  • p is the vector containing all the probabilities
  • probability of none : prod(1-p)
  • probability of at least one : 1 - prod(1-p)

I found how to calculate the probability of exactly one event :

sum(p * (prod(1-p) / (1-p)))

But I don't know how to generate a formula for n events.

Heikki
  • 2,214
  • 19
  • 34
MrLoedus
  • 11
  • 2
  • Why do you have many probabilities? Where did these come from? What is your event distributed like? – user2974951 Feb 13 '19 at 14:47
  • Not sure this is a good fit here, doesn't seem like a programming question (especially without any sample data). Probably better at https://math.stackexchange.com/ – Gregor Thomas Feb 13 '19 at 14:49
  • We run a simulation of a pollution in a zone, and we have at each mesh of the zone a probability of been above a treshold. That's why we have a lot of probabilities. – MrLoedus Feb 13 '19 at 14:54
  • This is about the "Poisson binomial distribution" if you want exact answers (https://en.wikipedia.org/wiki/Poisson_binomial_distribution). There are approximations in terms of the normal distribution that would be faster to compute. I agree this is really a math question. – Michael Lugo Feb 13 '19 at 14:54
  • @MichaelLugo You don't know that, OP hasn't stated what the underlying distribution is. – user2974951 Feb 13 '19 at 14:56
  • Let's change your notation to something more common: call `n = 100000` the number of possible events and `k` (1000 in this case) the number of events occurring. There are `n choose k` ways that exactly `k` events could occur, and the complexity of the calculation (if the probabilities are unique) will be roughly linear with `n choose k`. `n choose k` is small when k is close to 0 or close to n, but is astronomical in between. [Wolfram alpha](https://www.wolframalpha.com/input/?i=100000+choose+1000) show `100000 choose 1000` as about 1.6*10^2430... – Gregor Thomas Feb 13 '19 at 14:57
  • 2
    Belongs on math.stackexchange (or possibly stats.stackexchange) – Gregor Thomas Feb 13 '19 at 14:58
  • @user2974951 the Poisson binomial distribution is a general case that allows arbitrary underlying probabilities. (The name is a bit confusing - it's a generalization of the binomial distribution, due to Poisson, and has nothing to do with the distribution usually called "Poisson".) – Michael Lugo Feb 13 '19 at 14:58
  • But if your probabilities are following a distribution, (or better yet, are equal), there may be better approaches than coming up with "a formula". – Gregor Thomas Feb 13 '19 at 14:59
  • This will help: https://cran.r-project.org/web/packages/IPSUR/vignettes/IPSUR.pdf , it's a pretty comprehensive guid to probability and statistics with R – DS_UNI Feb 13 '19 at 15:02

1 Answers1

0

I do not know R, but I know how I would solve this with programming.

This is a straightforward dynamic programming problem. We start with a vector v = [1.0] of probabilities. Then in untested Python:

for p_i in probabilities:
    next_v = [p_i * v[0]]
    v.append(0.0)
    for j in range(len(v) - 1):
        next_v.append(v[j]*p_i + v[j+1]*(1-p_i)
    # For roundoff errors
    total = sum(next_v)
    for j in range(len(next_v)):
        next_v[j] /= total
    v = next_v

And now your answers can be just read off of the right entry in the vector.

This approach is equivalent to calculating Pascal's triangle row by row, throwing away the old row when you're done.

btilly
  • 43,296
  • 3
  • 59
  • 88