25

I want to generate a random number with a given probability but I'm not sure how to:

I need a number between 1 and 3

num = ceil(rand*3);

but I need different values to have different probabilities of generating eg.

0.5 chance of 1
0.1 chance of 2
0.4 chance of 3

I'm sure this is straightforward but I can't think of how to do it.

Eamonn McEvoy
  • 8,876
  • 14
  • 53
  • 83
  • 1
    Does this answer your question? [Draw random numbers from pre-specified probability mass function in Matlab](https://stackoverflow.com/questions/58607156/draw-random-numbers-from-pre-specified-probability-mass-function-in-matlab) – SecretAgentMan Oct 29 '19 at 16:11

7 Answers7

46

The simple solution is to generate a number with a uniform distribution (using rand), and manipulate it a bit:

r = rand;
prob = [0.5, 0.1, 0.4];
x = sum(r >= cumsum([0, prob]));

or in a one-liner:

x = sum(rand >= cumsum([0, 0.5, 0.1, 0.4]));

Explanation

Here r is a uniformly distributed random number between 0 and 1. To generate an integer number between 1 and 3, the trick is to divide the [0, 1] range into 3 segments, where the length of each segment is proportional to its corresponding probability. In your case, you would have:

  • Segment [0, 0.5), corresponding to number 1.
  • Segment [0.5, 0.6), corresponding to number 2.
  • Segment [0.6, 1], corresponding to number 3.

The probability of r falling within any of the segments is proportional to the probabilities you want for each number. sum(r >= cumsum([0, prob])) is just a fancy way of mapping an integer number to one of the segments.

Extension

If you're interested in creating a vector/matrix of random numbers, you can use a loop or arrayfun:

r = rand(3); % # Any size you want
x = arrayfun(@(z)sum(z >= cumsum([0, prob])), r);

Of course, there's also a vectorized solution, I'm just too lazy to write it.

Eitan T
  • 32,660
  • 14
  • 72
  • 109
9

The answers so far are correct, but slow for large inputs: O(m*n) where n is the number of values and m is the number of random samples. Here is a O(m*log(n)) version that takes advantage of monotonicity of the cumsum result and the binary search used in histc:

% assume n = numel(prob) is large and sum(prob) == 1
r = rand(m,1);
[~,x] = histc(r,cumsum([0,prob]));
Alec Jacobson
  • 6,032
  • 5
  • 51
  • 88
  • Small note, depending on how `histc` is implemented this could be O(n+m*log(n)). Though I'd hope since the first output is not used, this isn't the case. – Alec Jacobson Dec 06 '13 at 18:23
  • 1
    There's an even better O(n + m) solution using the [Alias Method](http://en.wikipedia.org/wiki/Alias_method). I've implemented it in the [sample_discrete.m](https://github.com/alecjacobson/gptoolbox/blob/master/matrix/sample_discrete.m) function. – Alec Jacobson May 08 '14 at 01:43
  • The link is broken, fyi. – intrepid_em Feb 07 '19 at 02:21
5
>> c = cumsum([0.5, 0.1, 0.4]);
>> r = rand(1e5, 1);
>> x = arrayfun(@(x) find(x <= c, 1, 'first'), r);
>> h = hist(x, 1:3)

h =

       49953       10047       40000

x distributed as desired.

Serg
  • 13,470
  • 8
  • 36
  • 47
  • @EitanT, I don't think that `sum()` is faster than `find(..., 'first')`. Also, there is no need in adding zero. Please test it. In general case I would only add: `assert(c(end) == 1);` – Serg Dec 18 '12 at 20:45
  • Now that I think about it, my comment is out of place. My apologies. – Eitan T Dec 18 '12 at 21:10
5

using randsample function from Statistics and Machine Learning Toolbox, you can generate random numbers with specified probability mass function (pmf):

pmf = [0.5, 0.1, 0.4];
population = 1:3;
sample_size = 1;

random_number = randsample(population,sample_size,true,pmf);

I think this is the easiest method.

mamaj
  • 910
  • 8
  • 8
4

A slightly more general solution would be:

r=rand;
prob=[.5,.1,.4];
prob=cumsum(prob);
value=[1,2,3];    %values corresponding to the probabilities
ind=find(r<=prob,1,'first');
x=value(ind)
user1860611
  • 529
  • 1
  • 4
  • 16
0

When the probabilities are nice numbers like this it is possible to do a very simple and performant selection. We repeat population elements such that a uniform selection yields the desired probability distribution. In this case we create a population of 10, with 5 times 1 (0.5 probability of being selected), etc.

p = [1,1,1,1,1,2,3,3,3,3];
x = p(randi(numel(p));

randi takes a second input argument that determines the size of the output (the default is 1), so it’s simple to generate many values from this distribution.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
0

A vector solution using rand, cumsum, and min.

r = rand(10,1);
p = [0.5 0.1 0.4];
[~, ind] = min(r >= cumsum(p), [], 2)
  • Randomly sample r from 0..1 using rand. In this case I put my data into a column vector.
  • Put the probabilities for each output index into p.
  • r >= cumsum(p) compares every combination of r and the cumulative probabilities of p. In this case the result is a 2D matrix where each row begins with a series of 1s and ends with a series of 0s. The first 0 indicates the element of p that was randomly selected.
  • min is performed for all rows and returns the column index of the first 0. The third input to min defines the dimension over which to calculate the minimum.

If you want to extend this to n dimensions of r: change the shape of p so that it extends into one more dimension than what r has, and give that dimension as min's third input.

r = rand(3, 5, 7);
p = []; 
p(1,1,1,:) = [0.5 0.1 0.4];
[~, ind] = min(r >= cumsum(p), [], 4)
aimor
  • 1
  • 1