0

I am using scipy.stats.binom to work with the binomial distribution. Given n and p, the probability function is

this formula

A sum over k ranging over 0 to n should (and indeed does) give 1. Fixing a point x_0, we can add the probabilities in both directions and the two sums ought to add to 1. However the code below yields two different answers when x_0 is close to n.

from scipy.stats import binom

n = 9
p = 0.006985
b = binom(n=n, p=p)
x_0 = 8

# Method 1
cprob = 0
for k in range(x_0, n+1):
    cprob += b.pmf(k)
print('cumulative prob with method 1:', cprob)

# Method 2
cprob = 1
for k in range(0, x_0):
    cprob -= b.pmf(k)
print('cumulative prob with method 2:', cprob)

I expect the outputs from both methods to agree. For x_0 < 7 it agrees but for x_0 >= 8 as above I get

>> cumulative prob with method 1: 5.0683768775504006e-17
>> cumulative prob with method 2: 1.635963929799698e-16

The precision error in the two methods propagates through my code (later) and gives vastly different answers. Any help is appreciated.

Abhishek Parab
  • 215
  • 2
  • 11
  • Possible duplicate of [How best to sum up lots of floating point numbers?](https://stackoverflow.com/questions/394174/how-best-to-sum-up-lots-of-floating-point-numbers) – Severin Pappadeux Jun 10 '19 at 12:37
  • You could look at math.fsum as suggested by @ev-br, as well as https://pypi.org/project/accupy/ – Severin Pappadeux Jun 10 '19 at 12:38
  • "The precision error in the two methods propagates through my code (later) and gives vastly different answers." -- if this is the case, your problem is ill-conditioned; you should probably look for a formulation which is not so strongly dependent on a result of 1 minus the sum of terms which is approximately 1. – Robert Dodier Jun 11 '19 at 17:29

1 Answers1

1

Roundoff errors of the order of the machine epsilon are expected and are inevitable. That these propagate and later blow up means that your problem is very poorly conditioned. You'd need to rethink the algorithm or an implementation, depending on where the bad conditioning comes from.

In your specific example you can get by using either np.sum (which tries to be careful with roundoff), or even math.fsum from the standard library.

ev-br
  • 24,968
  • 9
  • 65
  • 78