1

I need to sample numbers from a distribution, but the probabilities of all numbers must be something I can control.

For example, say I have 5 nodes (a, b, c, d, e) on a graph, and each node has an "attachment probability" that determines how likely a new node added to the graph will "stick" to it.

For example, the attachment probabilities of my 5 nodes might be:

{
    a: 0.1
    b: 0.1
    c: 0.2
    d: 0.1
    e: 0.5
}

When I add a new node, it should attach to node "e" most of the time (since it has the highest probability) but of course this should not be all the time, since these are probabilities.

I could manually create a sample of say 1000 numbers, whose occurrences follow the above probabilities. So the array would have 100 letter a, 100 letter b, 200 letter c, 100 letter d and 500 letter e. Then I could do a random sample from this array, which would be the same as drawing from a distribution with the above mentioned probabilities.

Is there any other (less manual) way of doing this in javascript? Does the Math or random API have a way to specify the probabilities that underly the sampling?

Cybernetic
  • 12,628
  • 16
  • 93
  • 132

3 Answers3

0

My solution

const STEP = 1
const CONF = {
  a: 1,
  b: 1,
  c: 2,
  d: 1,
  e: 5,
}

function getDistribution() {
  const distributionMap = {}
  let start = 0
  for (let key in CONF) {
    for (let i = 0; i < CONF[key]; i += STEP) {
      distributionMap[start++] = key
    }
  }
  return distributionMap[Math.floor(Math.random() * start)]
}

// Test
const testDistribution = {}
for (let i = 0; i < 1000; i++) {
  const key = getDistribution()
  testDistribution[key] = testDistribution[key] ? testDistribution[key] + 1 : 1
}

console.log(testDistribution)

// {a: 96, c: 183, e: 511, d: 110, b: 100}
// {e: 511, c: 194, a: 107, d: 90, b: 98}
// {e: 500, a: 106, c: 210, d: 90, b: 94}
Frank He
  • 637
  • 4
  • 9
0

The standard choice for sampling with replacement given a set of weights is to make a cumulative sum of the weights, then pick a random value < the sum, and pick the index that overlaps the value.

For example:

const weighted_choice = function(table) {
  const choices = [], cumweights = [];
  let sum = 0;
  for (const k in table) {
    choices.push(k);
    // work with the cumulative sum of weights
    cumweights.push(sum += table[k]);
  }
  return function() {
    const val = Math.random() * sum;
    // a binary search would be better for "large" tables
    for (const i in cumweights) {
      if (val <= cumweights[i]) {
        return choices[i];
      }
    }
  };
};

I'm returning a lambda so that the cumulative sums don't need to recalculated every time. Compared to Frank's code this doesn't assume you're passing integer counts, so the weights can efficiently span a much larger range.

You could test the above function this like:

const gen = weighted_choice({
  a: 0.1,
  b: 0.1,
  c: 0.2,
  d: 0.1,
  e: 0.5,
});

const counts = {};
for (let i = 0; i < 10000; i++) {
  const val = gen();
  counts[val] = (counts[val] || 0) + 1;
}

console.log(counts);

which prints out something like:

{ a: 1014, b: 952, c: 1971, d: 990, e: 5073 }
Sam Mason
  • 15,216
  • 1
  • 41
  • 60
0

I think my original idea makes the most sense, although the answers provided are likely doing the same thing.

First, I create a function that makes "bins" based off the passed-in probability weights:

function create_bins_from_probability_weights(options) {
    const res = {};
    Object.keys(options.table_of_probs).forEach(function(key) {
        var prob = options.table_of_probs[key];
        var bin_size = (prob * options.population_size);
        res[key] = bin_size;
    })
    return (res)
}

I then create a function for making a representative population, whose members reflect the values in the above bins:

function create_population_from_bins(options) {
    const res = [];
    Object.keys(options.bins).forEach(function(key) {
        for(var i = 0; i < options.bins[key]; i++) {
            res.push(key);
        }
    })
    return (res)
}

Finally, I create a function for taking a random sample from the above representative population:

function random_sample_from_array(options) {
    const res = options.array[Math.floor(Math.random() * options.array.length)];
    return (res)
}

Altogether, we can use these functions as follows:

Using a table of probabilities:

table = {
    a : 0.1,
    b : 0.1,
    c : 0.2,
    d : 0.1,
    e : 0.5
}

...create bins:

var bins = create_bins_from_probability_weights({
    table_of_probs: table,
    population_size: 1000
})

...create a representative population from the bins:

array_of_values = create_population_from_bins({
    bins : bins
})

...take a single sample from the above population:

var final = random_sample_from_array({
    array : array_of_values
})

The final variable is the most likely single sample to draw, since it is drawn from a population whose members reflects the probabilities used to create the distribution.

Cybernetic
  • 12,628
  • 16
  • 93
  • 132