2

I have the following function implemented in python:

# Calculation of cumulative binomial distribution
def PDP(p, N, min):
    pdp=0
    for k in range(min, N+1):
        pdp += (float(factorial(N))/(factorial(k)*factorial(N-k)))*(p**k)*((1-p)**(N-k))
    return pdp

However, calculations produce too large values with a high n (up to 255). I have searched for approximations to these values, but to no avail. How would you go about this?

Helder Esteves
  • 451
  • 4
  • 13

2 Answers2

1

Suppose X follows binomial distribution,

and you want to compute P(X >= m), I would first do a continuity correction so approximate by P(X >= m-0.5), and then I would approximate it using normal approximation.

P((X - np)/ sqrt(np(1-p)) >= (m-0.5-np)/sqrt(np(1-p)) 

which is approximation

P(Z >= (m-0.5-np)/sqrt(np(1-p)) 

where Z is the standard normal distribution.

References for such approximation.

Siong Thye Goh
  • 3,518
  • 10
  • 23
  • 31
  • Broken link, could you write the name of the site or paper? – borgr Jan 27 '20 at 13:33
  • 1
    Hi, I change the link to the wikipedia page of "continuity correction". Thanks for alerting me about the broken link. – Siong Thye Goh Jan 27 '20 at 13:55
  • Why is that better to approximating it using central limit theorem? e.g.https://www.dummies.com/education/math/statistics/how-to-find-the-normal-approximation-to-the-binomial-with-a-large-sample-n/ – borgr Jan 27 '20 at 14:07
  • 1
    We are using central limit theorem, just that we are introducing a correction to address that we are using a continuous distribution to approximate a discrete distribution, [here](https://documentation.statsoft.com/STATISTICAHelp.aspx?path=Glossary/GlossaryTwo/C/ContinuityCorrection)'s an example where using corection improves the result. – Siong Thye Goh Jan 27 '20 at 14:16
1

Based on Siong's answer, I have come up with the following solution:

import math

# Cumulative distribution function
def CDF(x):
    return (1.0 + math.erf(x/math.sqrt(2.0)))/2.0

# Approximation of binomial cdf with continuity correction for large n
# n: trials, p: success prob, m: starting successes
def BCDF(p, n, m):
    return 1-CDF((m-0.5-(n*p))/math.sqrt(n*p*(1-p)))
Helder Esteves
  • 451
  • 4
  • 13