10

Python has a number of ways to generate different distributions of random numbers, see the documentation for the random module. Unfortunately they aren't terribly understandable without the appropriate math background, especially considering the required parameters.

I'd like to know if any of those methods are capable of producing random numbers with a distribution that obeys Benford's Law, and what parameter values are appropriate. Namely for a population of integers, those integers should start with a '1' about 30% of the time, '2' about 18% of the time, etc.


Using John Dvorak's answer, I put together the following code, and it appears to work perfectly.

def benfords_range_gen(stop, n):
    """ A generator that returns n random integers
    between 1 and stop-1 and whose distribution
    meets Benford's Law i.e. is logarithmic.
    """
    multiplier = math.log(stop)
    for i in range(n):
        yield int(math.exp(multiplier * random.random()))

>>> from collections import Counter
>>> Counter(str(i)[0] for i in benfords_range_gen(10000, 1000000))
Counter({'1': 300696, '2': 176142, '3': 124577, '4': 96756, '5': 79260, '6': 67413, '7': 58052, '8': 51308, '9': 45796})

A question has also arisen about whether this works consistently between different versions of Python. That's not a trivial question to answer, because of the nature of random numbers - you expect some variation from run to run, and sometimes between different versions of the random library. The only way to avoid that is to seed the random number generator consistently between every run. I've added that to my test and I get the exact same results in Python 2.7.1, 3.8.6, and 3.9.1.

>>> random.seed(7919)
>>> Counter(str(i)[0] for i in benfords_range_gen(10000, 1000000))
Counter({'1': 301032, '2': 176404, '3': 125350, '4': 96503, '5': 78450, '6': 67198, '7': 58000, '8': 51342, '9': 45721})
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Isn't this the same as picking a random number uniformly between log(min) and log(max) and outputting 10 raised to that number? – David Schwartz Jan 28 '13 at 06:21
  • 3
    Given what Benford's law is, try `floor(10^random())` – John Dvorak Jan 28 '13 at 06:24
  • @JanDvorak: Yep. For 1,000,000 trials, I get: `{1: 0.301143, 2: 0.175899, 3: 0.125316, 4: 0.097045, 5: 0.079359, 6: 0.066662, 7: 0.057795, 8: 0.050963, 9: 0.045818}`. In Python, it'd be `int(10**random.random())` – Blender Jan 28 '13 at 06:27
  • @Blender so, it's correct it seems – John Dvorak Jan 28 '13 at 06:27
  • Trying to do some accounting fraud, perchance? ;) – wim Jan 28 '13 at 06:30
  • 1
    @JanDvorak, if you're serious then propose it as an answer. I like the fact that it doesn't depend on any of the fancy distribution models, just a simple transformation of the usual equal probabilities. – Mark Ransom Jan 28 '13 at 06:33
  • I just re-listened to the radiolab "numbers" episode last night, and planned to read up on Benford's law today...strange coincidence. – monkut Jan 28 '13 at 06:34
  • @wim, I wish! I was just using some random numbers recently, and noticed that the distribution wasn't as "natural" as I'd like. I figured Benford's Law would be the cure. – Mark Ransom Jan 28 '13 at 06:34
  • Somehow, this solution hasn't held up over the years. Running Python 3.9,1, I'm seeing that the first-digit percentages are never as close to Benford's Law as seen in the single run above. For example "1" should be about 30.1%, but it's coming up as 31.2% or 31.3% . The parameters are exactly as in the solution. – grjash Jan 13 '21 at 21:12
  • @grjash the results I see today with 3.8.6 are within 0.2% of what I show above. There's bound to be a little variation if the random numbers have the proper randomness properties, but your results do seem a little out of line. Unfortunately I can't test 3.9 at the moment. – Mark Ransom Jan 13 '21 at 21:25

2 Answers2

22

Benford's law describes the distribution of the first digits of a set of numbers if the numbers are chosen from a wide range on the logarithmic scale. If you prepare a log-uniform distribution over one decade, it will respect the law as well. 10^[0,1) will produce that distribution.

This will produce the desired distribution: math.floor(10**random.random())

John Dvorak
  • 26,799
  • 13
  • 69
  • 83
0

Just playing around.

A much more inefficient, but perhaps more visible implementation for those, like myself, who are not so math inclined...

An easy way to create any desired distribution is to fill a list with the desired percentages of an item, and then use random.choice(<list>), since this returns a uniform selection of items in the list.

import random
probs = [30.1, 17.6, 12.5, 9.7, 7.9, 6.7, 5.8, 5.1, 4.6]
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]
population = sum([[n] * int(p * 10) for n, p in zip(nums, probs)], [])

max_value = 100
min_value = 1
result_pop = []
target_pop_size = 1000
while len(result_pop) < target_pop_size:
    s = str(random.choice(population))
    while True:
        r = random.randint(min_value, max_value)
        if str(r).startswith(s):
            break
    result_pop.append(r)
monkut
  • 42,176
  • 24
  • 124
  • 155