36

Consider two 1-dim arrays, one with items to select from and one containing the probabilities of drawing the item of the other list.

items = ["a", 2, 5, "h", "hello", 3]
weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3]

In Julia, how can one randomly select an item in items using weights to weight the probability to drawing a given item?

Prix
  • 19,417
  • 15
  • 73
  • 132
Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • @Prix Thanks for you update. Isn't it important to indicate the language of interest in the title for such question? Maybe in parenthesis at the end of the question? – Remi.b Dec 19 '14 at 05:02
  • Ok, thanks. Indeed, it would be great to be able to order the tags. – Remi.b Dec 19 '14 at 05:09
  • Hope you like it this way, I see no reason not to have it there either so I guess it comes down to personal preferences ;) – Prix Dec 19 '14 at 05:15

2 Answers2

41

Use the StatsBase.jl package, i.e.

Pkg.add("StatsBase")  # Only do this once, obviously
using StatsBase
items = ["a", 2, 5, "h", "hello", 3]
weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3]
sample(items, Weights(weights))

Or if you want to sample many:

# With replacement
my_samps = sample(items, Weights(weights), 10)
# Without replacement
my_samps = sample(items, Weights(weights), 2, replace=false)

(In Julia < 1.0, Weights was called WeightVec).

You can learn more about Weights and why it exists in the docs. The sampling algorithms in StatsBase are very efficient and designed to use different approaches depending on the size of the input.

N. Virgo
  • 7,970
  • 11
  • 44
  • 65
IainDunning
  • 11,546
  • 28
  • 43
4

Here's a much simpler approach which only uses Julia's base library:

sample(items, weights) = items[findfirst(cumsum(weights) .> rand())]

Example:

>>> sample(["a", 2, 5, "h", "hello", 3], [0.1, 0.1, 0.2, 0.2, 0.1, 0.3])
"h"

This is less efficient than StatsBase.jl, but for small vectors it's fine.

Also, if weights is not a normalized vector, you can do:

sample(items, weights) = items[findfirst(cumsum(weights) .> rand() * sum(weights))]
Miles
  • 780
  • 7
  • 19