Select randomly from array with given probabilites without replacement

Question

Assume I have an array of items [1,2, ...n], and a probabilities array [p1,p2,....,pn], where n is a very large number and may reach to thousands. The sum of all probabilities equals 1.

I need to select 3 unique items randomly each time, an item with a high probability has a higher chance of being selected.
I need to do the selection for more than 20k times.

I've implemented a working method by creating a new array that contains the items with repetition based on their probability. For example, if probabilities for item1, item2, and item3 are [2/n,4/n,1/n] respectively, then the new array will contain [1,1,2,2,2,2,3].

It works fine but it's not efficient. Also, using this method there is the possibility of selecting the same item multiple times, then I have to reselect another item which consumes time.

Are there any efficient methods or built-in functions in MATLAB for this purpose?

I think this answers your question: https://stackoverflow.com/a/13914141/1011724 — Dan, Feb 26 '18 at 11:54
Also for the *pick without replacement* issue, if `n`is large and you're only picking 3 values, you probably just want to check for repetitions and repick if found. — Steve, Feb 26 '18 at 11:58
Possible duplicate of [Generate random number with given probability matlab](https://stackoverflow.com/questions/13914066/generate-random-number-with-given-probability-matlab) — Steve, Feb 26 '18 at 12:03
You _specify probabilities_ for the items, but you also want to sample _without replacement_. With those two requirements, it seems difficult to avoid re-picking. For example, [`randsample`](https://es.mathworks.com/help/stats/randsample.html) can handle either requirement, but not both at the same time. Note also that in your current method, if sampled values are not unique you should start over and repick _all_ items, otherwise probabilities are not guaranteed — Luis Mendo, Feb 26 '18 at 12:11
This is not a dupe of the linked Q&A, because of the no-replacement requirement here — Luis Mendo, Feb 26 '18 at 12:17

Wolfie · Accepted Answer · 2018-02-26T12:58:30.827

Your initial arrays

 x = [1, 2, 3];   % 1:n, where n = 3
 p = [2, 4, 1]/7; % probabilities of choosing each element

You can choose an element with given probability using this:

 r = rand; % get random number in range (0,1)
 xi = x(find(cumsum(p) >= r, 1)); % Get x where cumulative probability >= random number

You want to choose without replacement, so let's remove the element each loop

k = 2;           % number of choices
r = rand(k,1);   % random numbers
xi = zeros(k,1); % output choices
for ii = 1:k     % choices loop
    % Choose x with probability of each element contained in p
    idx = find(cumsum(p) >= r(ii), 1);
    xi(ii) = x(idx);
    % Remove item from lists
    x(idx) = []; p(idx) = [];
    % Rescale probabilities
    p = p/sum(p);
end

With this method, duplicate x entries will be treated as independent, with respect to their own specified p values.

Note: if you want to do this selection N times then use some temporary x and p variables, so that the next time you do the selection you haven't removed some elements! Or even better, vectorise the selection loop and make x and p into N×n, removing an element from each row each time.

score -1 · Answer 2 · answered Feb 26 '18 at 20:10

The function datasample does what you ask for, if you use it with the optional argument Weights and then your weight array.

Note: datasample uses the old function histc to generate the sample, whereas the newer histcounts is recommended. Though this is only a problem if you want better efficiency. Quote from the documentation:

histc is not recommended. Use HISTCOUNTS instead.

Select randomly from array with given probabilites without replacement

2 Answers2