4

If I have the array :

ar = [1,3,5,3,6,1,4,6,7,6,6,6,6,6]

I could reduce this to the amount of occurrences :

counts = {1=>2, 3=>2, 5=>1, 6=>7, 4=>1, 7=>1}

Now I would like to choose at random with the least used number in ar being more weighted

I understand how I could easily make a weighted random choice based on most commonly used number, but not its inverse.

Trip
  • 26,756
  • 46
  • 158
  • 277
  • Your array has `14` elements, two of which are `1`. So what should the probability be of picking `1`? How do you want to apply the weighting? – Tom Lord May 09 '19 at 15:21
  • 1
    The trivial answer would be to simply invert the weights and use an existing mechanism. There are more-complex (and direct) mechanisms, but they're all likely similar in the long run. – Dave Newton May 09 '19 at 15:21
  • What about reverting the weights order (making them negative or dividing 1 by weights etc - any transformation that reverts the order should work) and solving the task you already understand the solution for? – Konstantin Strukov May 09 '19 at 15:22
  • 1
    This all depends what weighting is intended. Should it be **twice as likely** to pick a `5` than a `1`? – Tom Lord May 09 '19 at 15:24
  • 1
    Pick any monotonically decreasing function, compute its values at given frequences, consider it relative probabilities, normalize, sample. – Severin Pappadeux May 09 '19 at 15:48
  • Nothing can be said until you clarify the question. It needs to be precise and unambiguous. – Cary Swoveland May 09 '19 at 15:59

2 Answers2

3

It seems like this would work for you:

arr = [1,3,5,3,6,1,4,6,7,6,6,6,6,6]

arr.group_by(&:itself).transform_values{|v| arr.size / v.size}.flat_map do |k,v| 
 [k] * v
end.sample

We group the elements and count them then we create a new Array with the number of elements inverted to favor the lesser occurrences. e.g.

arr.group_by(&:itself).transform_values{|v| arr.size / v.size}.flat_map do |k,v| 
 [k] * v
end.group_by(&:itself).transform_values(&:size)
#=> {1=>7, 3=>7, 5=>14, 6=>2, 4=>14, 7=>14}

Since 5 occurred once originally it now occurs 14 times (same with 4 and 7). So 5,4, and 7 have equal likelihood of being chosen and are each twice as likely as 1 and 3 which occured twice and 7 times as likely as 6.

Also maybe something like this might be more efficient

grouping =arr.group_by(&:itself).transform_values(&:size).
scale = grouping.values.uniq.reduce(&:lcm)

grouping.flat_map do |k, v|
  [k]  * (scale / v)
end.sample
engineersmnky
  • 25,495
  • 2
  • 36
  • 52
  • Yah well done. That was an excellent Ruby response. I have never heard of `&:itself` before – Trip May 09 '19 at 16:59
  • 1
    @Trip Just a small note: the proposed solution is concise and elegant but if input array contains unique values only it will dramatically increase (square) the array size before trying to sample a random value from it. Try creating an input array like `arr = (0..999).map { rand(100000) }` and then apply the code above to it - you will get an array of approximately 1M records to sample from. For an array with just 50K unique items taking weighted random value using this method requires almost 10 seconds on my laptop... – Konstantin Strukov May 09 '19 at 18:59
  • @KonstantinStrukov I agree we could definitely go with a more performant solution for sure – engineersmnky May 09 '19 at 20:00
  • @KonstantinStrukov we can determine the lcm of the group sizes to make this more efficient now your code returns instantly. – engineersmnky May 09 '19 at 22:18
  • YOu could abstract it further by making call to monotonically decreasing function. You use inverse function, but actually any one could do the job. Say, 1/exp(frequency) will make selection probabilities very sharp, while 1/log(1+frequency) would be lenient one... Compute this function for all frequences, make it relative probabilities, normalize, sample. – Severin Pappadeux May 10 '19 at 01:46
1

If you already have an algorithm for making a random weighted choice, one option to swap the weight can be as follows.

grouping = ar.group_by { |n| n }.transform_values(&:size)
#=> {1=>2, 3=>2, 5=>1, 6=>7, 4=>1, 7=>1}
weights = grouping.values.uniq.sort
#=> [1, 2, 7]
reverse_mapping = weights.zip(weights.reverse).to_h
#=> {1=>7, 2=>2, 7=>1}
grouping.transform_values{ |v| reverse_mapping[v] }
#=> {1=>2, 3=>2, 5=>7, 6=>1, 4=>7, 7=>7}

That's the idea.


Can be refactored to be more Rubyish:
res = ar.group_by { |n| n }.transform_values(&:size).then do |h|
  rev_map = h.values.uniq.sort.then { |w| w.zip(w.reverse).to_h }
  h.transform_values{ |v| rev_map[v] }
end

#=> {1=>2, 3=>2, 5=>7, 6=>1, 4=>7, 7=>7}
iGian
  • 11,023
  • 3
  • 21
  • 36