2

Related to this question, I am wondering the algorithms (and actual code in java/c/c++/python/etc., if you have!) to generate all combinations of r elements for a list with m elements in total. Some of these m elements may be repeated.

Thanks!

Community
  • 1
  • 1
Qiang Li
  • 10,593
  • 21
  • 77
  • 148
  • @Jean-Bernard: combination vs. permutation! – Qiang Li Nov 16 '11 at 02:10
  • "The CombinationGenerator Java class systematically generates all combinations of n elements, taken r at a time. The algorithm is described by Kenneth H. Rosen, Discrete Mathematics and Its Applications, 2nd edition (NY: McGraw-Hill, 1991), pp. 284-286." See http://www.merriampark.com/comb.htm. It has a link to source code. – Jim Mischel Nov 16 '11 at 02:34
  • @JimMischel: that is not what I want. For String[] elements = { "a", "a", "b", "b" }; and r=3, it generated "aab aab abb abb". But I want to get unique combinations "aab abb", though I could have removed them after generation; but that is not efficient. – Qiang Li Nov 16 '11 at 02:48
  • So remove the duplicate elements from your `elements` array before you start. That's going to be a lot easier and more efficient than trying to prevent the code from generating duplicates. – Jim Mischel Nov 16 '11 at 17:13
  • 2
    @JimMischel: but the number of repeated elements is key information for generating such combinations. – Qiang Li Nov 16 '11 at 18:06
  • Related question: http://stackoverflow.com/q/4319222/618728 – Mr.Wizard Sep 09 '13 at 09:59

3 Answers3

2

recurse for each element type

int recurseMe(list<list<item>> items, int r, list<item> container)
{
  if (r == container.length)
  {
    //print out your collection;
    return 1;
  }
  else if (container.length > score)
  {
    return 0;
  }
  list<item> firstType = items[0];
  int score = 0;
  for(int i = 0; i < firstType.length; i++)
  {
    score += recurseMe(items without items[0], r, container + i items from firstType);
  }
  return score;
}

This takes as input a list containing lists of items, assuming each inner list represents a unique type of item. You may have to build a sorting function to feed as input to this.

//start with a list<item> original;
list<list<item>> grouped = new list<list<item>>();
list<item> sorted = original.sort();//use whichever method for this
list<item> temp = null;
item current = null;
for(int x = 0; x < original.length; x++)
  if (sorted[x] == current)
  {
    temp.add(current);
  }
  else
  {
    if (temp != null && temp.isNotEmpty)
      grouped.add(temp);
    temp = new list<item>();
    temp.add(sorted[x]);
  }
}
if (temp != null && temp.isNotEmpty)
  grouped.add(temp);
//grouped is the result

This sorts the list, then creates sublists containing elements that are the same, inserting them into the list of lists grouped

Jean-Bernard Pellerin
  • 12,556
  • 10
  • 57
  • 79
  • why do you use "a list containing lists of items"? i have to look at this more closely, unless you give some runnable code or explain the ideas in more details. Thanks! – Qiang Li Nov 16 '11 at 02:25
  • You said there could be some repeats, this groups them into inner lists. Makes the processing a lot easier. I'll edit the code with a way to generate such a list of lists – Jean-Bernard Pellerin Nov 16 '11 at 02:27
  • +1. Thanks a lot. Now I start to see your solution! But I am having problems implementing it in Java. Not sure whether I understand your algorithm correctly. Do you mind making it working code? You almost have it! – Qiang Li Nov 16 '11 at 06:46
  • @Mr.Wizard: oh, are you doing this in Mma? I haven't done it, but it would be quite interesting to see this. BTW, does Mma have a built-in function to do this? I would guess so. – Qiang Li Nov 17 '11 at 01:20
1

I'm going to make this an answer rather than a bunch of comments.

My original comment was:

The CombinationGenerator Java class systematically generates all combinations of n elements, taken r at a time. The algorithm is described by Kenneth H. Rosen, Discrete Mathematics and Its Applications, 2nd edition (NY: McGraw-Hill, 1991), pp. 284-286." See merriampark.com/comb.htm. It has a link to source code.

As you pointed out in your comment, you want unique combinations. So, given the array ["a", "a", "b", "b"], you want it to generate aab, abb. The code I linked generates aab, aab, baa, baa.

With that array, removing duplicates is very easy. Depending on how you implement it, you either let it generate the duplicates and then filter them after the fact (i.e. selecting unique elements from an array), or you modify the code to include a hash table so that when it generates a combination, it checks the hash table before putting the item into the output array.

Looking something up in a hash table is an O(1) operation, so either of those is going to be efficient. Doing it after the fact will be a little bit more expensive, because you'll have to copy items. Still, you're talking O(n), where n is the number of combinations generated.

There is one complication: order is irrelevant. That is, given the array ["a", "b", "a", "b"], the code will generate aba, abb, aab, bab. In this case, aba and aab are duplicate combinations, as are abb and bab, and using a hash table isn't going to remove those duplicates for you. You could, though, create a bit mask for each combination, and use the hash table idea with the bit masks. This would be slightly more complicated, but not terribly so.

If you sort the initial array first, so that duplicate items are adjacent, then the problem goes away and you can use the hash table idea.

There's undoubtedly a way to modify the code to prevent it from generating duplicates. I can see a possible approach, but it would be messy and expensive. It would probably make the algorithm slower than if you just used the hash table idea. The approach I would take:

Sort the input array
Use the linked code to generate the combinations
Use a hash table or some other code to select unique items.

Although ... a thought that occurred to me.

Is it true that if you sort the input array, then any generated duplicates will be adjacent? That is, given the input array ["a", "a", "b", "b"], then the generated output will be aab, aab, abb, abb, in that order. This will be implementation dependent, of course. But if it's true in your implementation, then modifying the algorithm to remove duplicates is a simple matter of checking to see if the current combination is equal to the previous one.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
1

Here is a recursion that I believe is closely related to Jean-Bernard Pellerin's algorithm, in Mathematica.

This takes input as the number of each type of element. The output is in similar form. For example:

{a,a,b,b,c,d,d,d,d} -> {2,2,1,4}

Function:

f[k_, {}, c__] := If[+c == k, {{c}}, {}]

f[k_, {x_, r___}, c___] := Join @@ (f[k, {r}, c, #] & /@ 0~Range~Min[x, k - +c])

Use:

f[4, {2, 2, 1, 4}]
{{0, 0, 0, 4}, {0, 0, 1, 3}, {0, 1, 0, 3}, {0, 1, 1, 2}, {0, 2, 0, 2},
 {0, 2, 1, 1}, {1, 0, 0, 3}, {1, 0, 1, 2}, {1, 1, 0, 2}, {1, 1, 1, 1},
 {1, 2, 0, 1}, {1, 2, 1, 0}, {2, 0, 0, 2}, {2, 0, 1, 1}, {2, 1, 0, 1},
 {2, 1, 1, 0}, {2, 2, 0, 0}}

An explanation of this code was requested. It is a recursive function that takes a variable number of arguments. The first argument is k, length of subset. The second is a list of counts of each type to select from. The third argument and beyond is used internally by the function to hold the subset (combination) as it is constructed.

This definition therefore is used when there are no more items in the selection set:

f[k_, {}, c__] := If[+c == k, {{c}}, {}]

If the total of the values of the combination (its length) is equal to k, then return that combination, otherwise return an empty set. (+c is shorthand for Plus[c])

Otherwise:

f[k_, {x_, r___}, c___] := Join @@ (f[k, {r}, c, #] & /@ 0~Range~Min[x, k - +c])

Reading left to right:

  • Join is used to flatten out a level of nested lists, so that the result is not an increasingly deep tensor.

  • f[k, {r}, c, #] & calls the function, dropping the first position of the selection set (x), and adding a new element to the combination (#).

  • /@ 0 ~Range~ Min[x, k - +c]) for each integer between zero and the lesser of the first element of the selection set, and k less total of combination, which is the maximum that can be selected without exceeding combination size k.

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
  • Wizard, +1 Nice. is it a translation from Jean-Bernard Pellerin's algorithm? Can you please add some explanation? Thanks! – Qiang Li Nov 17 '11 at 21:15
  • @QiangLi Thanks. Not a translation directly, but the hints of "recursion" and "a unique type of item" were enough to get my mind working. I could not figure out how to do this with the elements directly (at lest not without really ugly code), so I approached it numerically. I shall try to add an explanation, but I am short on time right now. – Mr.Wizard Nov 17 '11 at 21:44