7

If I have an array:

a = [1,2,3]

How do I randomly select subsets of the array, such that the elements of each subset are unique? That is, for a the possible subsets would be:

[]
[1]
[2]
[3]
[1,2]
[2,3]
[1,2,3]

I can't generate all of the possible subsets as the real size of a is very big so there are many, many subsets. At the moment, I am using a 'random walk' idea - for each element of a, I 'flip a coin' and include it if the coin comes up heads - but I am not sure if this actually uniformly samples the space. It feels like it biases towards the middle, but this might just be my mind doing pattern-matching, as there will be more middle sized possiblities.

Am I using the right approach, or how should I be randomly sampling?

(I am aware that this is more of a language agnostic and 'mathsy' question, but I felt it wasn't really Mathoverflow material - I just need a practical answer.)

Phrogz
  • 296,393
  • 112
  • 651
  • 745
meagerf
  • 71
  • 1
  • 2

5 Answers5

5

Just go ahead with your original "coin flipping" idea. It uniformly samples the space of possibilities.

It feels to you like it's biased towards the "middle", but that's because the number of possibilities is largest in the "middle". Think about it: there is only 1 possibility with no elements, and only 1 with all elements. There are N possibilities with 1 element, and N possibilities with (N-1) elements. As the number of elements chosen gets closer to (N/2), the number of possibilities grows very quickly.

Alex D
  • 29,755
  • 7
  • 80
  • 126
1

You could generate random numbers, convert them to binary and choose the elements from your original array where the bits were 1. Here is an implementation of this as a monkey-patch for the Array class:

class Array
  def random_subset(n=1)
    raise ArgumentError, "negative argument" if n < 0
    (1..n).map do
      r = rand(2**self.size)
      self.select.with_index { |el, i| r[i] == 1 }
    end
  end
end

Usage:

a.random_subset(3) 
#=> [[3, 6, 9], [4, 5, 7, 8, 10], [1, 2, 3, 4, 6, 9]]

Generally this doesn't perform so bad, it's O(n*m) where n is the number of subsets you want and m is the length of the array.

Michael Kohl
  • 66,324
  • 14
  • 138
  • 158
0
a.select {|element| rand(2) == 0 }

For each element, a coin is flipped. If heads ( == 0), then it is selected.

Martin Velez
  • 1,379
  • 11
  • 24
  • 1
    `sample(rand * a.size)` produces subsets between 0 and a.size - 1 in length. If you desire to exclude the empty set, and include the superset, `sample(rand(a.size) + 1)`. – Wayne Conrad Jan 19 '12 at 19:29
  • I used `rand(a.size + 1)` and it seems to produce both the empty subset `[]` and the subset `a` itself. So it can produce all possible subsets of `a`. – Martin Velez Jan 19 '12 at 19:58
  • 1
    Note that `Array#sample` is available in Ruby 1.9+ – zetetic Jan 19 '12 at 21:32
  • 1
    This is NOT an uniform distribution on the power set of `a`, since it FIRST pick a length and THEN a sample of that length. In the power set of `a` there are clearly much more sets of length `2` than of length `a.length`! – Alberto Santini Jan 20 '12 at 08:04
  • @AlbertSantini I agree with you. I changed my answer. – Martin Velez Jan 21 '12 at 06:19
0

I think the coin flipping is fine.

ar = ('a'..'j').to_a
p ar.select{ rand(2) == 0 }

An array with 10 elements has 2**10 possible combinations (including [ ] and all 10 elements) which is nothing more then 10 times (1 or 0). It does output more arrays of four, five and six elements, because there are a lot more of those in the powerset.

steenslag
  • 79,051
  • 16
  • 138
  • 171
0

A way to select a random element from the power set is the following:

my_array = ('a'..'z').to_a
power_set_size = 2 ** my_array.length
random_subset = rand(power_set_size)
subset = []
random_subset.to_i(2).chars.each_with_index do |bit, corresponding_element|
  subset << my_array[corresponding_element] if bit == "1"
end

This makes use of strings functions instead than working with real "bits" and bitwise operations just for my convenience. You can turn it into a faster (I guess) algorithm by using real bits.

What it does, is to encode the powerset of array as an integer between 0 and 2 ** array.length and then picks one of those integers at random (uniformly random, indeed). Then it decodes back the integer into a particular subset of array using a bitmask (1 = the element is in the subset, 0 = it is not).

In this way you have an uniform distribution over the power set of your array.

Alberto Santini
  • 6,425
  • 1
  • 26
  • 37
  • I just noticed Michael Kohl posted a similar solution, which is probably better. It uses real bit operations and also gives you the chance to request more than one subset. – Alberto Santini Jan 20 '12 at 12:42