How to choose one element based on its probability?

Question

Let's say I have a vector containing N elements, each being its probability. For example, v = [0.01 0.01 0.09 0.82 0.07]

So I want a function f(v) that returns 4 at 82% of the time, 3 at 9% of the time etc.

The input vector v is always normalized so that sum(v) = 1, this can be a simplification.

How can I implement this probabilistic function in MATLAB? Or maybe there is a built-in function for this?

score 4 · Answer 1 · answered Oct 20 '14 at 16:46

If you have the statistic toolbox, you can use randsample:

f = randsample(1:numel(v), 1, true, v)

What this does is take one random number from the vector 1:numel(v), with probability distribution as given in v. If you want several values, you can change the second parameter to the number of random numbers you want.

score 4 · Accepted Answer · edited May 23 '17 at 12:22

If you do NOT have the statistics toolbox (otherwise see Stewie's answer) derive the empirical cdf and then use inverse sampling:

% Numbers of draws
N = 1e3; 
% Sample a uniform in (0, 1)
x = rand(N,1);

% Empirical cdf
ecdf = cumsum([0, v]);

% Inverse sampling/binning 
[counts, bin] = histc(x,ecdf);

where bin maps directly to v. So, if you have a generic set of values y with probabilitites v, you should take:

out = y(bin);

Note, as the number of draws N increases, we get a better approximation of the probabilities:

counts(end) = []; % Discard last bucket x==1
counts./sum(counts)

The function can be:

function [x, counts] = mysample(prob, val, N)
if nargin < 2 || isempty(val), val = 1:numel(prob); end
if nargin < 3 || isempty(N),   N   = 1;             end 

[counts, bin] = histc(rand(N,1), cumsum([0, prob(:)']));
x = val(bin);

end

How to choose one element based on its probability?

2 Answers2