3

Say I have an array like so:

const alphabet = ['a', 'b', 'c', 'd'];

This represents 4 political candidates and a rank choice vote, where candidate a is first choice, b is second choice, etc.

I want to shuffle this into a bunch of random orders, but in this case I want a to appear first with probably 60%, b second with probability 20%, and c third with probability 10%, and all the other ordering with probably 10%. Is there some lodash and ramda functionality that can accomplish this or?

This is for testing a rank choice voting algorithm. Shuffling the array randomly yields candidates that all have pretty much identical vote counts which doesn't mirror most reality (although I will test for that too).

I have this pretty horrible routine which will generate one random array:

const getValues = function () {

  const results = [];
  const remaining = new Set(alphabet);
  const probabilities = [0.6, 0.2, 0.1, 0.1];

  for(let i = 0; i < alphabet.length; i++){

    const r  = Math.random();
    const letter = alphabet[i];

    if(r < probabilities[i] && remaining.has(letter)){
      results.push(letter);
      remaining.delete(letter);
    }
    else{
      const rand = Math.floor(Math.random()*remaining.size);
      const x = Array.from(remaining)[rand];
      remaining.delete(x);
      results.push(x);
    }

  }

   return results;
};

this "works" but doesn't quite order things according to the specified probabilities, because of conditional probability. Does someone know of a good way to have the order appear with certain probability, as I described above?

Here is some sample output that I am looking for:

[ [ 'd', 'b', 'a', 'c' ],
  [ 'a', 'b', 'c', 'd' ],
  [ 'a', 'd', 'b', 'c' ],
  [ 'd', 'b', 'a', 'c' ],
  [ 'b', 'c', 'a', 'd' ],
  [ 'a', 'b', 'c', 'd' ],
  [ 'd', 'b', 'c', 'a' ],
  [ 'c', 'd', 'a', 'b' ],
  [ 'd', 'b', 'a', 'c' ],
  [ 'a', 'b', 'c', 'd' ] ]

if you generated enough data it wouldn't fit the desired order/distribution.

Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
  • 1
    btw, you have `0.5` for the second candidate, where you want `0.2` ... – Nina Scholz Apr 04 '19 at 06:16
  • @NinaScholz yes you're right I corrected that, although I think in actuality, that's the data that needs to be fixed – Alexander Mills Apr 04 '19 at 06:16
  • 1
    The different probabilities will naturally overlap, at least a bit, unless you specifically exclude that possibility - for example, wanting `a to appear first with probably 60%` will sometimes occur at the same time as `b second with probability 20%` etc. Do you want to make sure that there is *no* overlap (if so, the code will be easy!)? Or, if overlap is permitted, the probabilities should add up to more than 100%. Which do you want (and if the latter, can you clarify the %s you're expecting)? – CertainPerformance Apr 04 '19 at 06:20
  • 2
    You could simulate voting, I guess. If candidate `a` has 60% chance, then give him 60 "votes" or just 60 entries. Somebody with 15% chance will get 15 entries. Shuffle array -> pick one -> remove the duplicates -> repeat. That's *sort of* like voting. – VLAZ Apr 04 '19 at 06:22
  • @VLAZ good idea, but an algo like that will give me dupes though, try writing it and see what you get – Alexander Mills Apr 04 '19 at 06:24
  • @AlexanderMills there is a "remove duplicates" step. You pick, say `b`, then remove all `b` votes before picking again. – VLAZ Apr 04 '19 at 06:25
  • @CertainPerformance what do you mean by overlap? each candidate can only appear once in the array if that's what you mean – Alexander Mills Apr 04 '19 at 06:42
  • For example, `['a', 'b', 'd, 'c']` fulfills the first two conditions, because `a` occurs first and `b` occurs second. If that sort of thing is permitted, then the %s should add up to more than 100 in total. – CertainPerformance Apr 04 '19 at 06:43
  • @CertainPerformance yes you're exactly right, that's pretty much the problem I have and not sure how to solve. I want some declarative data structure and generic algorithm that can match the results to some desired distribution (histogram). – Alexander Mills Apr 04 '19 at 07:23
  • 2
    The issue there is not about how to *solve* the problem, but what sort of solution you're *looking for*, which is unclear at the moment. (Code can be written for either possibility, which isn't hard once the sort of output is desired) – CertainPerformance Apr 04 '19 at 07:25

4 Answers4

1

This might hopefully help you, example incorporated for your situation from https://github.com/substack/node-deck

Example

 const normalize = function (weights) {
 if (typeof weights !== 'object' || Array.isArray(weights)) {
  throw 'Not an object'
 }

 let keys = Object.keys(weights);
 if (keys.length === 0) return undefined;

 let total = keys.reduce(function (sum, key) {
  let x = weights[key];
  if (x < 0) {
   throw new Error('Negative weight encountered at key ' + key);
  }
  else if (typeof x !== 'number') {
   throw new TypeError('Number expected, got ' + typeof x);
  }
  else {
   return sum + x;
  }
 }, 0);

 return total === 1
  ? weights
  : keys.reduce(function (acc, key) {
   acc[key] = weights[key] / total;
   return acc;
  }, {})
  ;
};

const pick = function (xs) {
 if (Array.isArray(xs)) {
  return xs[Math.floor(Math.random() * xs.length)];
 }
 else if (typeof xs === 'object') {
  // Weighted Sample
  let weights = normalize(xs);
  if (!weights) return undefined;

  var n = Math.random();
  var threshold = 0;
  var keys = Object.keys(weights);

  for (let i = 0; i < keys.length; i++) {
   threshold += weights[keys[i]];
   if (n < threshold) return keys[i];
  }
  throw new Error('Exceeded threshold. Something is very wrong.');
 }
 else {
  throw new TypeError('Must be an Array or an object');
 }
};

const shuffle = function (xs) {
 if (Array.isArray(xs)) {
  let res = xs.slice();
  for (var i = res.length - 1; i >= 0; i--) {
   var n = Math.floor(Math.random() * i);
   var t = res[i];
   res[i] = res[n];
   res[n] = t;
  }
  return res;
 }
 else if (typeof xs === 'object') {
  // Weighted
  let weights = Object.keys(xs).reduce(function (acc, key) {
   acc[key] = xs[key];
   return acc;
  }, {});

  let ret = [];

  while (Object.keys(weights).length > 0) {
   let key = pick(weights);
   delete weights[key];
   ret.push(key);
  }

  return ret;
 }
 else {
  throw new TypeError('Must be an Array or an object');
 }
};


let results = [];
for (let i = 0; i < 100; i++) {
 let weighted = shuffle({
  a : 60, 
  b : 20, 
  c : 10, 
  d : 10, // or .1, 100, 1000
 });
 results.push(weighted);
}
console.log(results);
ABC
  • 2,068
  • 1
  • 10
  • 21
1

You could take a random part of the array and normalize the remaining possibilities and take another one until all items are take.

As result, you get nearly a wanted result, as you see in the counts of items and their final index.

const
    getIndex = (prob) => prob.findIndex((r => p => r < p || (r -= p, false))(Math.random())),
    normalized = array => {
        var sum = array.reduce((a, b) => a + b, 0);
        return array.map(v => v / sum);
    };

var items = ['a', 'b', 'c', 'd'],
    probabilities = [0.6, 0.2, 0.1, 0.1],
    counts = { a: { 0: 0, 1: 0, 2: 0, 3: 0 }, b: { 0: 0, 1: 0, 2: 0, 3: 0 }, c: { 0: 0, 1: 0, 2: 0, 3: 0 }, d: { 0: 0, 1: 0, 2: 0, 3: 0 } },
    l = 100,
    index,
    result = [], 
    subP,
    subI,
    temp;

while (l--) {
    temp = [];
    subP = probabilities.slice();
    subI = items.slice();
    while (subP.length) {
        sum = subP.reduce
        index = getIndex(normalized(subP));
        temp.push(subI[index]);
        subI.splice(index, 1);
        subP.splice(index, 1);
    }
    result.push(temp);
}

console.log(result.map(a => a.join()));

result.forEach(a => a.forEach((v, i) => counts[v][i]++));

console.log(counts);
.as-console-wrapper { max-height: 100% !important; top: 0; }
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
1

You could sort them using a shuffle function like this:

const candidates = [
  { name: "a", weight: 6 },
  { name: "b", weight: 2 },
  { name: "c", weight: 1 },
  { name: "d", weight: 1 }
];

const randomShuffleFn = () => Math.random() - .5;

const shuffleFn = (candidateA, candidateB) =>
  Math.random() * (candidateB.weight + candidateA.weight) - candidateA.weight;

console.log([...candidates].sort(randomShuffleFn).sort(shuffleFn));

OK, it's not exactly the same, but I think with tweaking the weights you can get the required distribution (as it is, A wins more than 60% of times).

mbojko
  • 13,503
  • 1
  • 16
  • 26
1

I think the problem is poorly stated.

As it is written, A shall be on place 1 with 60% probability, B on place 2 with 20%, C and D on places 3 or 4 with 10% each. There is no distribution which fulfills these probability criteria, so no algorithm can produce it: If in 60% of the cases A is on place 1, either C or D must be on places 3 or 4 in these 60%, so that's way above the required 10% probability.

So, the first task here is to make sense out of what is written in the question (because of course it can make sense, after interpretation).

I guess the 60% for A and 20% for B should not be read as probability but as a kind of popularity. But it cannot be just the quorum for each candidate because in a voting process A will finish on place 1 in 100% of the cases then.

So, let's assume a voting process with some randomness involved which lets A finish on place 1 with 60% probability, B on place 1 (!) with 20% probability, etc. Then we can implement this using a weighted random choice for place 1.

How to continue with the places 2..n? We just keep the weights intact and remove the candidate which already has been chosen. If one of the other candidates has made it to place 1, then this will make A end with a high probability on place 2 which I think makes sense.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • sure, the idea is to be able to dictate the probability somehow, that's the challenge – Alexander Mills Apr 05 '19 at 04:01
  • I was trying to convey that you can dictate the probability of the first place, but not at the same time the probability of all other places. Instead, you can repeat the process after the first place was found. – Alfe Apr 05 '19 at 08:40
  • Yeah with that thought, you could generate all the first elements, and then given all those you could calculate all the second elements, after that, all the 3rd elements, etc. There's some conditional probability going on. I am just looking for the right algorithm. – Alexander Mills Apr 07 '19 at 00:33