How to weight a random number based on an array

Question

I've been thinking about how to implement something that, frankly, is beyond my mathematical skills. So here goes, feel free to try and point me in the right direction rather than complete code solutions any help I'd be grateful for.

So, imagine I've done an analysis of text and generated a table of the frequencies of different two-character combinations. I've stored these in a 26x26 array. eg.

  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A 1 15 (frequency of AA, then frequency of AB etc.)
B 12 0 (freq of BA, BB etc..)
... etc.

So I want to randomly choose these two-character combinations but I'd like to 'weight' my choice based on the frequency. ie. the AB from above should be 15 times 'more likely' than AA. And, obviously, the selection should never return something like BB (ie. a frequency of 0 - in this example, obviously BB does occur in words like Bubble!! :-) ). For the 0 case I realise I could loop until I get a non-0 frequency but that's just not elegant because I have a feeling/intuition that there is a way to skew my average.

I was thinking to chose the first char of my pair - ie. the row - (I'm generating a 4-pair-sequence ultimately) I could just use the system random function (Random class.Next) then use the 'weighted' random algorithm to pick the second char.

Any ideas?

Here's the **FULL CODE** to save you typing .. http://stackoverflow.com/a/33991225/294884 ... copy and paste — Fattie, Nov 30 '15 at 03:58
This was two years ago! Paul's answer below worked perfectly and I used that. Thanks though. — Th3Minstr3l, Dec 01 '15 at 08:42
for sure, it's just handy for people googling now or in the future! cheers! — Fattie, Dec 01 '15 at 13:06

score 5 · Accepted Answer · answered Jan 30 '13 at 12:18

5

Given your example sample, I would first create a cumulative series of all of the numbers (1, 15, 12, 0 => 1, 16, 28, 28).

Then I would produce a random number between 0 and 27 (let's say 19).

Then I would calculate that 19 was >=16 but <28, giving me bucket 3 (BA).

answered Jan 30 '13 at 12:18

paul

21,653
1
53
54

This works perfectly for what I need, thank you. So now I'm selecting my first letter (row) by using a CryptoRNG then turning the columns of that row into a cumulative series. Generating a second Random Number and picking my second character out of the bucket I end up in. Thank you! :-) – Th3Minstr3l Feb 11 '13 at 11:17

Eric Lippert · Answer 2 · 2013-10-18T20:43:49.350

5

There are some good suggestions in the other answers for your specific problem. To solve the general problem of "I have a source of random numbers conforming to a uniform probability distribution, but I would like it to conform to a given nonuniform probability distribution", then you can work out the quantile function, which is the function that performs that transformation. I give a gentle introduction that explains why the quantile function is the function you want here:

Generating Random Non-Uniform Data In C#

edited Oct 18 '13 at 20:43

answered Jan 30 '13 at 15:14

Eric Lippert

647,829
179
1,238
2,067

@OP: Just to link this to paul's answer, you could normalize paul's answer (i.e., generate a random number between 0 and 1 instead of 0 and the [Total Weight], then divide the bucket ranges by [Total Weight]). Paul is describing how to generate a quantile function in your specific case, though he omits normalizing the ranges to `0..1`. – Brian Jan 30 '13 at 18:32
Thanks to Eric Lippert for that article, very useful in implementing my solution. See my reply to Paul's post. – Th3Minstr3l Feb 11 '13 at 11:20

score 1 · Answer 3 · answered Jan 30 '13 at 12:23

1

How about summing all the frequencies and using that from AA to ZZ to generate your pair.

Lets say you have a total frequency of pairs if the rnd return 0 you get AA if it returns 1-14 then its AB etc

answered Jan 30 '13 at 12:23

Rob Foran

47
4

Matthew Whited · Answer 4 · 2013-01-30T12:42:58.827

Use your frequency matrix to generate a complete set of values. Order the set by Random.Next(). Store the randomized set in an array. Then you can just select an element out if that array based on Random.Next(randomarray.Length).

If there is a mathematical way to calculate the frequency you could do that as well. But creating a precompiled and cached set will reduce the calculation time if this is called repeatedly.

As a note, depending on the max frequency this could require a good amount of storage. You would also want to create the instance of random before you loop to build the set. This is so you don't reseed the random generator.

...

Another way (similar to what you suggested at the end of your question) would be to do this in two passes with the first selecting the row and the second used your weighted frequency to select the column. That would just be the sum of the row frequencies bounded over a ranges. The first suggestion should give a more even distribution based on weight.

score 0 · Answer 5 · answered Jan 30 '13 at 12:36

Take the sum of the probabilities. Take a random number between zero and that sum. Add up the probabilities until you get it's greater than or equal to your random number. Then use the item your on.

Eg pseudocode:

b = getProbabilites()
s = sum(b)
r = randomInt() % s
i = 0
acc = 0
while (acc < r) {
    acc += b[i]
    i++
}

return i

score 0 · Answer 6 · answered May 16 '14 at 14:15

If efficiency is not a problem, you could create a key->value hash instead of an array. An upside of this would be that (if you format it well in the text) it would be very easy to update the values should the need arise. Something like

{
    AA => 5, AB => 2, AC => 4,
    BA => 6, BB => 5, BC => 9,
    CA => 2, CB => 7, CC => 8
}

With this, you could easily retrieve the value for the sequence you want, and quickly find the entry to update. If the table is automatically generated and extremely large, it could help to get/be familiar with vim's use of regular expressions.

How to weight a random number based on an array

6 Answers6