4

can anyone show me what the best way is to generate a (numpy) array containig values from 0 to 100, that is weighted by a (for example) normal distribution function with mean 50 and variance 5. So that there are more 50s and less (nearly no) zeros and hundreds. I think the problem should not be too hard to solve, but I'm stucked somehow...

I thought about something with np.linspace but it seems, that there is no weight option.

So just to be clear: I don't wan't a simple normal distribution from 0 to 100, but something like an array from 0 to 100 with higher density of values in the middle.

Thanks

wa4557
  • 953
  • 3
  • 13
  • 25
  • 1
    have you tried `numpy.random.normal(50, 5, size=10)`? – jfs Feb 24 '13 at 12:28
  • yeah, I have; but this is not exactly what I'm looking for since I don't like the random part in it. I'd prefer something that is normal distributed (almost) since I'm dealing with not that big sample rates – wa4557 Feb 24 '13 at 12:38

3 Answers3

4

You can use scipy's stats distributions:

import numpy as np
from scipy import stats

# your distribution:
distribution = stats.norm(loc=50, scale=5)

# percentile point, the range for the inverse cumulative distribution function:
bounds_for_range = distribution.cdf([0, 100])

# Linspace for the inverse cdf:
pp = np.linspace(*bounds_for_range, num=1000)

x = distribution.ppf(pp)

# And just to check that it makes sense you can try:
from matplotlib import pyplot as plt
plt.hist(x)
plt.show()

Of course, I admit the start and end point is not quite exact like this due to numerical inaccuracies when going back and forth.

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
seberg
  • 8,785
  • 2
  • 31
  • 30
1

It is important to understand, that your problem is not exactly solvable, since generally a finite discrete sample cannot exactly reproduce your distribution.

You can easily see this, when asking trivial versions of your question like a set of 3 values in [0,1] with an equal distribution. Here the results [0,0,1] and [0,1,1] would both be reasonable.

However, you can solve the problem roughly. If you ask for an array with count elements out of [0,1,...,N] where the given probabilities are p=[p0,p1,...,pN] and normalized (p0+...+pN==1) then the count c_k of the element k in your resulting array is theoretically

c[k] = p[k]*count

but these counts now are floats. You have to decide for a way to "round" them while keeping their total sum. This is the freedom of choice arising from the under-definedness of your question.

flonk
  • 3,726
  • 3
  • 24
  • 37
-1
>>> sorted([int(random.gauss(50,5)) for i in range(100)])
[33, 40, 40, 40, 40, 40, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 44, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 48, 49, 49, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 55, 55, 56, 56, 57, 57, 57, 57, 57, 57, 57, 58, 61]
nadapez
  • 2,603
  • 2
  • 20
  • 26