4

What is the best solution for finding probability of rolling sum with n dice? I'm solving it by finding

  1. mean.
  2. standard deviation.
  3. z_score for the numbers below x
  4. z_score for the numbers above x
  5. converting both to probabilities
  6. subtracting one from the other

This is what I've done so far.

# sides - number of sides on one die
def get_mean(sides)
  (1..sides).inject(:+) / sides.to_f
end

def get_variance(sides)
  mean_of_squares = ((1..sides).inject {|sum, side| sum + side ** 2}) / sides.to_f
  square_mean = get_mean(sides) ** 2

  mean_of_squares - square_mean
end

def get_sigma(variance)
  variance ** 0.5
end

# x - the number of points in question
def get_z_score(x, mean, sigma)
  (x - mean) / sigma.to_f
end

# Converts z_score to probability
def z_to_probability(z)
  return 0 if z < -6.5
  return 1 if z > 6.5

  fact_k = 1
  sum = 0
  term = 1
  k = 0

  loop_stop = Math.exp(-23)
  while term.abs > loop_stop do
    term = 0.3989422804 * ((-1)**k) * (z**k) / (2*k+1) / (2**k) * (z**(k+1)) / fact_k
    sum += term
    k += 1
    fact_k *= k
  end

  sum += 0.5
  1 - sum
end

# Calculate probability of getting 'х' total points by rolling 'n' dice with 'sides' number of sides.
def probability_of_sum(x, n, sides=6)

  mean = n * get_mean(sides)
  variance = get_variance(sides)
  sigma = get_sigma(n * variance)

  # Rolling below the sum
  z1 = get_z_score(x, mean, sigma)
  prob_1 = z_to_probability(z1)

  # Rolling above the sum
  z2 = get_z_score(x+1, mean, sigma)
  prob_2 = z_to_probability(z2)

  prob_1 - prob_2
end

# Run probability for 100 dice
puts probability_of_sum(400, 100)

What concerns me is, when I pick x = 200, the probability is 0. Is this correct solution?

Pav31
  • 540
  • 7
  • 19
  • If I understand correctly.. no. If you were to roll all 2's 100 times the total would be 200. So there is some probability of it happening. – Mangesh Tamhankar Aug 07 '15 at 17:36
  • That's what I was thinking too. However, according to [68-95-99.7 (empirical) rule](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule), or the 3-sigma rule "About **68%** of values drawn from a normal distribution are within **1** standard deviation _σ_ away from the mean; about **95%** of the values lie within **2** standard deviations; and about **99.7%** are within **3** standard deviations." In this case with **100** 6-sided dice the `mean = 350` and `σ = 17`. This means, that **99.7%** of the values will fall within a range of **299** to **401**. (350 +/- 17 * 3) – Pav31 Aug 07 '15 at 18:52
  • Also a note, there's something fundamentally wrong with the algorithm, running something very simple like x=12 and n=2 which should be 1/36 does not work. I think @NeilSlater may be on to something regarding the discreteness. – Mangesh Tamhankar Aug 07 '15 at 19:15
  • In simple cases like rolling 2 dice `Monte Carlo` simulation is way better solution. This case is different. – Pav31 Aug 07 '15 at 19:51
  • 2
    Monte Carlo is not a good solution if you care about accuracy of the results. For a simple distribution such as sum of identical dice it is possible to calculate with perfect precision and faster than a Monte-Carlo with any reasonable error. – Neil Slater Aug 07 '15 at 19:59

4 Answers4

6

Adding the result of two independent probability distributions is the same as convolving the two distributions. If the distributions are discrete, then it is a discrete convolution.

So if a single die is represented as:

probs_1d6 = Array.new(6) { Rational(1,6) }

Then 2d6 can be calculated like this:

probs_2d6 = []
probs_1d6.each_with_index do |prob_a,i_a|  
  probs_1d6.each_with_index do |prob_b,i_b| 
    probs_2d6[i_a + i_b] = ( probs_2d6[i_a + i_b] || 0 ) + prob_a * prob_b
  end
end

probs_2d6
# =>  [(1/36), (1/18), (1/12), (1/9), (5/36), (1/6), 
#      (5/36), (1/9), (1/12), (1/18), (1/36)]

Although this is n-squared for sides of dice, and a completely logical combination can reduce that, doing so is generally less flexible for more complex setups. The nice thing about the approach is you can keep adding more dice and do other more exotic combinations. For instance to get 4d6 you can convolve two results for 2d6. Using rationals allows you to side-step issues with floating point accuracy.

I skipped over one detail, you do need to store the initial offset (+1 for a normal six-sided die) and add it together as well, in order to know what the probabilities match to.

I have made a more sophisticated version of this logic in the gem games_dice, in floating point rather than Rational, that can cope with a few other dice combinations.

Here's a basic re-write of your method using the above approach in a naive way (simply combining effects of dice one at a time):

def probability_of_sum(x, n, sides=6)
  return 0 if x < n
  single_die_probs = Array.new(sides) { Rational(1,sides) }

  combined_probs = [1] # Represents chance of getting 0 when rolling 0 dice :-)

  # This is not the most efficient way to do this, but easier to understand
  n.times do
    start_probs = combined_probs
    combined_probs = []
    start_probs.each_with_index do |prob_a,i_a|  
        single_die_probs.each_with_index do |prob_b,i_b| 
          combined_probs[i_a + i_b] = ( combined_probs[i_a + i_b] || 0 ) + prob_a * prob_b
        end
    end
  end

  combined_probs[ x - n ] || 0
end

puts probability_of_sum(400, 100).to_f
# => 0.0003172139126369326

Note the method actually calculates the full probability distribution 100-to-600, so you only need to call it once and store the array (plus offset +100) once, and you can do other useful things such as probability of getting larger than a certain number. All with perfect precision due to using Rational numbers in Ruby.

Because in your situation, you only have one type of dice, we can avoid use of Rational until the end, working in just Integers (essentially counts of combined values), and divide by the total number of combinations (sides to power of number of rolls). This is much faster and returns values for 100s of dice in under a second:

def probability_of_sum(x, n, sides=6)
  return 0 if x < n
  combined_probs = [1] # Represents chance of getting 0 when rolling 0 dice :-)

  n.times do
    start_probs = combined_probs
    combined_probs = []
    start_probs.each_with_index do |prob_a,i_a|  
        sides.times do |i_b| 
          combined_probs[i_a + i_b] = ( combined_probs[i_a + i_b] || 0 ) + prob_a
        end
    end
  end

  Rational( combined_probs[ x - n ] || 0, sides ** n )
end
Neil Slater
  • 26,512
  • 6
  • 76
  • 94
  • 1
    Just a minor clarification. The probability density or mass function for the sum of independent variables is not similar to the convolution of the density or mass functions, it is exactly the convolution. – Robert Dodier Aug 08 '15 at 00:04
6

There is an exact solution involving an alternating sum of binomial coefficients. I have written it out in a few places (on Quora and MSE), and you can find it elsewhere although there are some flawed versions. Be careful that if you implement that, you may need to cancel binomial coefficients that are much larger than the final result, and if you use floating point arithmetic you might lose too much precision.

Neil Slater's suggestion to use dynamic programming to compute the convolution is a good one. It is slower than the summation of binomial coefficients, but reasonably robust. You can speed it up in a few ways, such as using exponentiation by squaring, and by using the Fast Fourier Transform, but many people will find those to be overkill.

To fix your method, you should use the (simple) continuity correction to the normal approximation, and restrict to the context where you have enough dice and you are evaluating far enough from the maximum and minimum that you expect a normal approximation will be good, either in an absolute or relative sense. The continuity correction is that you replace the count of n with the interval from n-1/2 to n+1/2.

The exact count of the number of ways to roll a total of 200 is 7745954278770349845682110174816333221135826585518841002760, so the probability is that divided by 6^100, which is about 1.18563 x 10^-20.

The normal approximation with the simple continuity correction is Phi((200.5-350)/sqrt(3500/12))-Phi((199.5-350)/sqrt(3500/12)) = 4.2 x 10^-19. This is accurate in absolute terms, since it is very close to 0, but it is off by a factor of 35 so it's not great in relative terms. The normal approximation gives a better relative approximation closer to the center.

Community
  • 1
  • 1
Douglas Zare
  • 3,296
  • 1
  • 14
  • 21
1

Here is my final version.

  1. Changed sum offsets in get_z_score to x-0.5 and x+0.5 respectively for more precise result.
  2. Added return 0 if x < n || x > n * sides to cover cases, where sum is less then the number of dice and higher then number of dice multiplied by number of sides.
  3. Added Benchmark tests with result

Main Functions

# sides - number of sides on one die
def get_mean(sides)
  (1..sides).inject(:+) / sides.to_f
end

def get_variance(sides)
  mean_of_squares = ((1..sides).inject {|sum, side| sum + side ** 2}) / sides.to_f
  square_mean = get_mean(sides) ** 2

  mean_of_squares - square_mean
end

def get_sigma(variance)
  variance ** 0.5
end

# x - the number of points in question
def get_z_score(x, mean, sigma)
  (x - mean) / sigma.to_f
end

# Converts z_score to probability
def z_to_probability(z)
  return 0 if z < -6.5
  return 1 if z > 6.5

  fact_k = 1
  sum = 0
  term = 1
  k = 0

  loop_stop = Math.exp(-23)
  while term.abs > loop_stop do
    term = 0.3989422804 * ((-1)**k) * (z**k) / (2*k+1) / (2**k) * (z**(k+1)) / fact_k
    sum += term
    k += 1
    fact_k *= k
  end

  sum += 0.5
  1 - sum
end

# Calculate probability of getting 'х' total points by rolling 'n' dice with 'sides' number of sides.
def probability_of_sum(x, n, sides=6)
  return 0 if x < n || x > n * sides

  mean = n * get_mean(sides)
  variance = get_variance(sides)
  sigma = get_sigma(n * variance)

  # Rolling below the sum
  z1 = get_z_score(x-0.5, mean, sigma)
  prob_1 = z_to_probability(z1)

  # Rolling above the sum
  z2 = get_z_score(x+0.5, mean, sigma)
  prob_2 = z_to_probability(z2)

  prob_1 - prob_2
end

Benchmark

require 'benchmark'

Benchmark.bm do |x|
  x.report { @prob = probability_of_sum(350, 100).to_f }
  puts "\tWith x = 350 and n = 100:"
  puts "\tProbability: #{@prob}"
end
puts

Benchmark.bm do |x|
  x.report { @prob = probability_of_sum(400, 100).to_f }
  puts "\tWith x = 400 and n = 100:"
  puts "\tProbability: #{@prob}"
end
puts

Benchmark.bm do |x|
  x.report { @prob = probability_of_sum(1000, 300).to_f }
  puts "\tWith x = 1000 and n = 300:"
  puts "\tProbability: #{@prob}"
end

Result

       user     system      total        real
   0.000000   0.000000   0.000000 (  0.000049)
    With x = 350 and n = 100:
    Probability: 0.023356331366255034

       user     system      total        real
   0.000000   0.000000   0.000000 (  0.000049)
    With x = 400 and n = 100:
    Probability: 0.00032186531055478085

       user     system      total        real
   0.000000   0.000000   0.000000 (  0.000032)
    With x = 1000 and n = 300:
    Probability: 0.003232390001131513
Pav31
  • 540
  • 7
  • 19
  • Hi, thanks for the solution ! Can you explain what does mean 0.3989422804 value and why loop_stop = Math.exp(-23) ? – AlexMrKlim Mar 11 '16 at 12:24
  • It was taken from [here](http://stackoverflow.com/questions/31875909/z-score-to-probability-and-vice-verse-in-ruby/31876457#31876457) which was taken from [here](http://stackoverflow.com/questions/16194730/seeking-a-statistical-javascript-function-to-return-p-value-from-a-z-score/16197404#16197404). Also you can try searching for _z-score to probability formula_ to get better understanding. – Pav31 Mar 22 '16 at 11:06
0

I also solved this using Monte Carlo method and the results are relatively close.

# x - sum of points to find probability for
# n - number of dice
# trials - number of trials
def monte_carlo(x, n, trials=10000)
  pos = 0

  trials.times do
    sum = n.times.inject(0) { |sum| sum + rand(1..6) }
    pos += 1 if sum == x
  end

  pos / trials.to_f
end

puts monte_carlo(300, 100, 30000)
Pav31
  • 540
  • 7
  • 19