0

I have a matrix M with n rows, and I have an n-dimensional column vector P containing an inclusion probability for each row of M. Note that the probabilities may be different for every row and they do not add to one. I would like to efficiently sample the rows of M, by including each row M_i in the sample (independently) with probability P_i. Note that I do not need the sampled matrix to be of a specific size k, I just need for each row to be randomly selected according to its inclusion probability.

I have done quite a bit of searching and I am aware of randsample and datasample, but neither of these do quite what I am looking for. Is there a built-in function for this type of sampling? If not, what would be the most efficient way to accomplish this in Matlab?

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
jorgyz
  • 23
  • 3

1 Answers1

0

Use rand compared against each probability to generate a logical index that tells which rows of M are picked.

The basic idea is that rand generates random numbers with uniform distribution in the interval (0,1), and thus the probability of one such number being less than a given x (with x between 0 and 1) is precisely x.

probs = [.6; .3; .2; .4]; %// contains n probabilities (where n is size(M,1))
ind = rand(size(probs))<probs; %// "0"/"1" logical index
result = M(ind,:) %// pick only rows with a "1"
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147