Cumulative binomial distribution for large numbers

Question

I have the following function implemented in python:

# Calculation of cumulative binomial distribution
def PDP(p, N, min):
    pdp=0
    for k in range(min, N+1):
        pdp += (float(factorial(N))/(factorial(k)*factorial(N-k)))*(p**k)*((1-p)**(N-k))
    return pdp

However, calculations produce too large values with a high n (up to 255). I have searched for approximations to these values, but to no avail. How would you go about this?

"numbers get very large with high n " is not a question, more of a fact. — Mitch Wheat, Jan 24 '18 at 02:02

Siong Thye Goh · Accepted Answer · 2020-01-27T13:54:22.913

1

Suppose X follows binomial distribution,

and you want to compute P(X >= m), I would first do a continuity correction so approximate by P(X >= m-0.5), and then I would approximate it using normal approximation.

P((X - np)/ sqrt(np(1-p)) >= (m-0.5-np)/sqrt(np(1-p))

which is approximation

P(Z >= (m-0.5-np)/sqrt(np(1-p))

where Z is the standard normal distribution.

References for such approximation.

edited Jan 27 '20 at 13:54

answered Jan 24 '18 at 02:12

Siong Thye Goh

3,518
10
23
31

Broken link, could you write the name of the site or paper? – borgr Jan 27 '20 at 13:33
1

Hi, I change the link to the wikipedia page of "continuity correction". Thanks for alerting me about the broken link. – Siong Thye Goh Jan 27 '20 at 13:55
Why is that better to approximating it using central limit theorem? e.g.https://www.dummies.com/education/math/statistics/how-to-find-the-normal-approximation-to-the-binomial-with-a-large-sample-n/ – borgr Jan 27 '20 at 14:07
1

We are using central limit theorem, just that we are introducing a correction to address that we are using a continuous distribution to approximate a discrete distribution, [here](https://documentation.statsoft.com/STATISTICAHelp.aspx?path=Glossary/GlossaryTwo/C/ContinuityCorrection)'s an example where using corection improves the result. – Siong Thye Goh Jan 27 '20 at 14:16

score 1 · Answer 2 · answered Jan 24 '18 at 12:53

Based on Siong's answer, I have come up with the following solution:

import math

# Cumulative distribution function
def CDF(x):
    return (1.0 + math.erf(x/math.sqrt(2.0)))/2.0

# Approximation of binomial cdf with continuity correction for large n
# n: trials, p: success prob, m: starting successes
def BCDF(p, n, m):
    return 1-CDF((m-0.5-(n*p))/math.sqrt(n*p*(1-p)))

Cumulative binomial distribution for large numbers

2 Answers2