0

I'm trying to get a fairly even distribution of one set of items into another and am looking for an algorithm that can help.

For example, Group A has 42 items and Group B has 16 items. I want to mix both groups together so that B is fairly evenly distributed within A. So, the merged group looks something like: {AA B AAA B AA B AA B AAA.....} It would be easy, of course, if A was a multiple of B, but that is not often the case for my needs.

gpraceman
  • 71
  • 8
  • 1
    When you "set" your're actually referring to a "list"? Because sets are unordered. Please clarify. – rslemos May 05 '15 at 21:46
  • I was using "set" as a general term. They are in fact lists of data. I'm essentially trying to take two sorted card decks of unequal size, and as evenly as possible, shuffle the smaller into the larger. – gpraceman May 05 '15 at 22:32

3 Answers3

0

You could start by obtaining the number of items from one set between the items of the other set:

float number_between = bigger_set.size() / smaller_set.size();

The iterate over the bigger set, subtracting 1 for each loop from an accumulator (initialized with number_between), inserting an item from the smaller set whenever this accumulator gets below 0, and refreshing it with number_between:

float accumulator = number_between;
foreach(item : bigger_set) {
  result.add(item);
  accumulator = accumulator - 1;
  if (accumulator < 0) {
      result.add(next from smaller_set);
      accumulator = accumulator + number_between;
  } 
} 

EDIT

Change to:

float number_between = (bigger_set.size() +1) / smaller_set.size();

If you want to be sure that the bigger list both starts and ends the result list.

EDIT 2

Beware that using floating point arithmetic may introduce rounding and underflow errors.

For example, if you're using IEEE single precision (mantissa with 24 bit ~ 7 decimal digits) and the bigger list is greater than the smaller list by a factor of 10^7 or more, the line accumulator = accumulator - 1 will underflow (and you'll get a result entirely made by the bigger set and none of the smaller set).

Also, rounding may lead to an attempt to draw further items from the smaller list when it is exhausted.

rslemos
  • 2,454
  • 22
  • 32
0

1) You could concatenate the two groups and do simple sampling from the combined group, for instance by shuffling the elements and iterating through the shuffled combined set.

2) If you'd rather do it sequentialy you could sample from each group with probabilities size(A) / (size(A) + size(B)) and size(B) / (size(A) + size(B)), where size(A) and size(B) are the current numbers of elements in groups A and B, respectively, that haven't yet been sampled. In other words, if U is a draw from a Uniform(0,1) random number generator:

if U <= size(A) / (size(A) + size(B))
   randomly draw next observation from A
else
   randomly draw next observation from B

In both approaches the A's and the B's end up uniformly distributed across the range, which is the statistical description of a "fairly even distribution".

You didn't specify a language, so here are concrete implementations of both approaches in Ruby. I've cut the set sizes in half to keep the output length reasonable, and obviously these will both produce different results each time they're run due to the use of randomness.

First approach:

a = ['A'] * 21
b = ['B'] * 8
c = (a + b).shuffle
puts c.join(',')

which, for example, produced the following output:

A,A,A,A,A,B,A,A,A,A,A,B,B,B,A,B,A,A,A,A,A,A,A,A,A,B,B,A,B

Second approach:

a = ['A'] * 21
b = ['B'] * 8
c = []    
while a.length > 0 || b.length > 0
  c << (rand <= (a.length / (a.length + b.length).to_f) ? a.shift : b.shift)
end    
puts c.join(',')

which, for example, produced the following output:

A,A,B,A,A,A,B,B,A,A,A,A,A,A,A,B,B,B,A,A,A,B,A,A,A,B,A,A,A
pjs
  • 18,696
  • 4
  • 27
  • 56
  • I wasn't really looking for a concrete implementation, more just pseudo code. Though, I do appreciate your concrete examples. I will be implementing in VB.Net initially and then porting to Python. – gpraceman May 05 '15 at 23:50
  • These should translate pretty straightforwardly to Python. I wanted to show that the recommendations weren't theoretical, they both can be realized in `O(size(A)+size(B))`. I'm not sure that can be said of your solution, since I don't know how the `mergedList.Insert` is implemented. – pjs May 05 '15 at 23:58
  • That mergedList.Insert() is to place the specified item into the list at the given index. I am not familiar with Ruby, so I'd have to figure out what is going on inside of your while loop. – gpraceman May 06 '15 at 05:06
  • Ruby's `shift` is equivalent to python's `list.pop(0)`. The `<<` is equivalent to python's `list.append(value)`. The stuff in between is the ternary `?:` operator found in many languages, including C, C++, Java, and Ruby. `x ? y : z` means if `x` is `true` evaluate `y`, otherwise evaluate `z`. In other words, the body of the loop evaluates the `if/else` described in pseudocode in approach 2. The end result is that `c` gets built up as a merged list of elements from `a` and `b`, where the choice of which type to merge is proportional to the number of that type in the remaining elements. – pjs May 06 '15 at 17:22
0

Well, I've been playing around with this and have come up with a solution that will work for my purposes. I essentially mix the larger items into the smaller items and loop back through until I run out of larger items.

For Each item In smallerList
  mergedList.add(smallerID)
Next

itemsRemaining = biggerList.Count

While itemsRemaining > 0
  index = 0

  For i = 1 To smallerList.Count
    If index >= mergedList.Count or itemsRemaining = 0 Then Continue While

    mergedList.Insert(index , largerID)
    index += 2 + loopCount
    itemsRemaining -= 1
  Next

  loopCount += 1
End While

Then I can replace the IDs with the actual items from the two lists.

So, for my original example (Group 1 with 42 items and Group 2 with 16), I end up with:

111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 11 2 11 2 11 2 11 2 11 2 11 2

It's a bit front loaded, but for my purposes, this will work out just fine.

gpraceman
  • 71
  • 8