You gave the probability density function (P
for "proability):
P(1) = 0.7
P(2) = 0.3
P(3) = 0.1
You need to construct the (cumulative) distribution function, which looks like this:

We can now generate random numbers between zero and one, plot them on the Y
axis, draw a line to the right to see where they intersect the distribution function, then read the associated X
coordinate as the random variate. So if the random number is less than 0.7, the random variate is 1
; if is between 0.7 and 0.9, the random variate is 2
and the random variate is 3
if the probability exceeds 0.9
. (Note that the probability that rand
will equal 0.7
(say) exactly is virtually zero, so we don't have to sorry about distinguishing between < 0.7
and <= 0.7
.)
To implement that, first calculate the hash df
:
y = { 1 => 0.7, 2 => 0.2, 3 => 0.1 }
last = 0.0
df = y.each_with_object({}) { |(v,p),h| last += p; h[last.round(10)] = v }
#=> {0.7=>1, 0.9=>2, 1.0=>3}
And now we can create a random variate as follows:
def rv(df)
rn = rand
df.find { |p,_| rn < p }.last
end
Let's try it:
def count(df,n)
n.times.each_with_object(Hash.new(0)) { |_,count|
count[rv(df)] += 1 }
end
n = 10_000
count(df,n)
#=> {1=>6993, 2=>1960, 3=>1047}
count(df,n)
#=> {1=>6986, 2=>2042, 3=>972}
count(df,n)
#=> {1=>6970, 2=>2039, 3=>991}
Note that the order of the key-value pairs count
is determined by the outcomes of the first few random variates, so the keys will not necessarily be in the order they are here.