0

I am learning python and I found something not intuitive from my perspective. I was trying to print Gausses curve, based on output from lottery. In that program I can set draw range, number of draws in one game and number of games. I sum results of draws in each game. I record how many times the result occurred, and based on that data I draw the graph.

When I set one draw in game, then each value probability is the same. It is visible in red colour on attached graph. And I expected that.

When I set three or more draws, the middle value probability is high. For example, if I have 3 draw in range from 0 to 100 then I can expect that sum of value will be in range from 0 to 300 and most probable value will be 150. When I draw in on graph, then I get Gauss curve. It is blue in graph.

The non intuitive case is when I set two draws. I expected that curve will be the same like in previous case, but I see that output is similar to triangular. It is green curve.

--> Graph image <--

The questions are:

  1. What is fundamental difference between two and more draw and why the output curves is different?

  2. Why when I set two draw then I will not get Gauss curve?

Python code:


import random
import matplotlib.pyplot as plt
import collections

class GaussGame():
    def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000) -> None:
        self.draw_range = draw_range
        self.number_of_draws = number_of_draws
        self.number_of_games = number_of_games

    def start(self):
        #Create win dictionary which contains amounts of possible wins as a key and, number of wins for each possible amounts as a value.
        win_dict = collections.OrderedDict()
        for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
            win_dict[x]=0

        #Loop for all games
        for x in range(self.number_of_games):
            #Loop for one game
            d_sum = 0 #Sum of the drawn values
            d_sum
            for x in range(self.number_of_draws):
                d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
            win_dict[d_sum] += 1
        return win_dict

def main():
    #When I run game several times, with different number_of_draws parameter and draw it on one graph, then I can get interesting picture :-D
    g1 = GaussGame({min: 0, max: 100},1,10000000)
    g2 = GaussGame({min: 0, max: 100},2,10000000)
    g3 = GaussGame({min: 0, max: 100},3,10000000)
    g4 = GaussGame({min: 0, max: 100},4,10000000)
    g5 = GaussGame({min: 0, max: 100},5,10000000)

    d1 = g1.start()
    d2 = g2.start()
    d3 = g3.start()
    d4 = g4.start()
    d5 = g5.start()

    plt.plot(d1.keys(), d1.values(), 'r.')
    plt.plot(d2.keys(), d2.values(), 'g.')
    plt.plot(d3.keys(), d3.values(), 'b.')
    plt.plot(d4.keys(), d4.values(), 'b.')
    plt.plot(d5.keys(), d5.values(), 'b.')
    plt.show()

if __name__ == "__main__":
    main()
  • This is a mathematical result known as [convolution](https://en.wikipedia.org/wiki/Convolution_of_probability_distributions). No programming is needed to derive it. – pjs Feb 26 '23 at 21:04

1 Answers1

1

That looks about right. What you see, I believe, is Irwin-Hall distribution, or its variation.

When you sum small number of samples, it is not gaussian, but converges to it as soon as there are many samples, see CLT

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64