python, weighted linspace

Question

can anyone show me what the best way is to generate a (numpy) array containig values from 0 to 100, that is weighted by a (for example) normal distribution function with mean 50 and variance 5. So that there are more 50s and less (nearly no) zeros and hundreds. I think the problem should not be too hard to solve, but I'm stucked somehow...

I thought about something with np.linspace but it seems, that there is no weight option.

So just to be clear: I don't wan't a simple normal distribution from 0 to 100, but something like an array from 0 to 100 with higher density of values in the middle.

Thanks

yeah, I have; but this is not exactly what I'm looking for since I don't like the random part in it. I'd prefer something that is normal distributed (almost) since I'm dealing with not that big sample rates — wa4557, Feb 24 '13 at 12:38

score 4 · Answer 1 · edited Nov 05 '15 at 10:30

You can use scipy's stats distributions:

import numpy as np
from scipy import stats

# your distribution:
distribution = stats.norm(loc=50, scale=5)

# percentile point, the range for the inverse cumulative distribution function:
bounds_for_range = distribution.cdf([0, 100])

# Linspace for the inverse cdf:
pp = np.linspace(*bounds_for_range, num=1000)

x = distribution.ppf(pp)

# And just to check that it makes sense you can try:
from matplotlib import pyplot as plt
plt.hist(x)
plt.show()

Of course, I admit the start and end point is not quite exact like this due to numerical inaccuracies when going back and forth.

flonk · Accepted Answer · 2013-02-25T10:08:34.553

It is important to understand, that your problem is not exactly solvable, since generally a finite discrete sample cannot exactly reproduce your distribution.

You can easily see this, when asking trivial versions of your question like a set of 3 values in [0,1] with an equal distribution. Here the results [0,0,1] and [0,1,1] would both be reasonable.

However, you can solve the problem roughly. If you ask for an array with count elements out of [0,1,...,N] where the given probabilities are p=[p0,p1,...,pN] and normalized (p0+...+pN==1) then the count c_k of the element k in your resulting array is theoretically

c[k] = p[k]*count

but these counts now are floats. You have to decide for a way to "round" them while keeping their total sum. This is the freedom of choice arising from the under-definedness of your question.

nadapez · Answer 3 · 2013-03-03T04:35:36.883

>>> sorted([int(random.gauss(50,5)) for i in range(100)])
[33, 40, 40, 40, 40, 40, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 44, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 48, 49, 49, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 55, 55, 56, 56, 57, 57, 57, 57, 57, 57, 57, 58, 61]

python, weighted linspace

3 Answers3

Linked