0

Using stats.binom.pmf(np.arange(0,11),10,0.5), I can get an array of numbers representing the probability of an event happening n times(0 to 10) in 10 times given the probability of a single occurrence 0.5, like below:

array([0.00097656, 0.00976563, 0.04394531, 0.1171875 , 0.20507813,
       0.24609375, 0.20507813, 0.1171875 , 0.04394531, 0.00976563,
       0.00097656])

Now I want to get a similar output for a multinomial case.

Saying I'm rolling a dice three times, how can I get such a probability array representing the probability of the total points(3 to 18) using an existing package?

  • 1
    You can do that using math and a couple lines of code. – Julien Oct 31 '21 at 23:28
  • 1
    Let me rephrase Julian: scipy and numpy do not provide this function (think of multinomial outcomes as apples and organges, not points), but it is trivially solvable with math. – Marat Nov 01 '21 at 00:20

2 Answers2

2

To illustrate my comment, here is the function:

def discrete_uniform_sum_pmf(a, b, n):
    du_pmf = {i: 1/(b-a+1) for i in range(a, b+1)}
    du_sum_pmf = {0: 1}

    for i in range(n):
        new_sum_pmf = defaultdict(float)
        for prev_sum, dice in product(du_sum_pmf, du_pmf):
            new_sum_pmf[prev_sum + dice] += du_sum_pmf[prev_sum] * du_pmf[dice]
        du_sum_pmf = new_sum_pmf

    return du_sum_pmf

It would be a lot simpler with numpy and has a lot of space for improvement (using CDF to avoid nested loops, exploiting symmetry, etc.) - but should be sufficient to illustrate the idea.

The output of discrete_uniform_sum_pmf(1, 6, 3) is the same shown by Arne

Marat
  • 15,215
  • 2
  • 39
  • 48
  • how would you use the CDF to remove the inner loop? – onepan Aug 01 '23 at 06:02
  • @onepan It has been a few years, so I do not remember what the CDF idea was - most likely, I was wrong about it. There are, however, some obvious improvements I overlooked back then - I'll add them shortly – Marat Aug 03 '23 at 19:20
1

The trick is to consider the appropriate Laplace space, where each event has the same probability. In the case of three dice, this means considering the tuples (1, 1, 1), (1, 1, 2), ... (6, 6, 6) as elementary events. So you can count up how many of these tuples result in each of the sums from 3 to 18, and then divide those numbers by the total number of tuples to get the probabilities:

from itertools import product
from collections import defaultdict

counts = defaultdict(int)
for dice in product(range(1, 7), repeat=3):
    counts[sum(dice)] += 1 
    
probabilities = {n: counts[n] / 6 ** 3 for n in range(3, 19)}
print(probabilities)
{3: 0.004629629629629629,
 4: 0.013888888888888888,
 5: 0.027777777777777776,
 6: 0.046296296296296294,
 7: 0.06944444444444445,
 8: 0.09722222222222222,
 9: 0.11574074074074074,
 10: 0.125,
 11: 0.125,
 12: 0.11574074074074074,
 13: 0.09722222222222222,
 14: 0.06944444444444445,
 15: 0.046296296296296294,
 16: 0.027777777777777776,
 17: 0.013888888888888888,
 18: 0.004629629629629629}
Arne
  • 9,990
  • 2
  • 18
  • 28
  • This approach (count all outcomes) works for 3 dice, but will grow exponentially in complexity with more rolls. This can be computed in linear time using recurrent formula, getting PMF of n rolls from PMF of n-1 rolls instead – Marat Nov 01 '21 at 03:03