5

I am trying to come up with a way to generate all possible unique strings from an alphabet of 20 characters where the order within the string doesn't matter, and the length of the string can vary. So, for instance, for a string of length 3, the possible strings would be AAA, AAB, AAC, etc., but would not include BAA or CAA. I figured out a way using itertools.product(), but it is very computationally expensive. The easiest way to do this is simply using nested for loops. For instance, to generate all strings of length four:

alphabet = ["A","C","D","E","F","G","H","I","K","L",
            "M","N","P","Q","R","S","T","V","W","Y"]
combos = []
for a in range(len(alphabet)):
    for b in range(a,len(alphabet)):
        for c in range(b,len(alphabet)):
            for d in range(c,len(alphabet)):
                combos.append(alphabet[a] + alphabet[b] + alphabet[c] + alphabet[d])

Now, this can easily be done for any length string by changing the number of for loops. Given the for loop sequence itself is quite predictable, is there are way to simplify this code instead of having if length == 3 run three for loops and if length == 4 run four loops instead? The only way I can think to do it right now is a bunch of if-elif statements:

if length == 3:
    for a in range(len(alphabet)):
        for b in range(a,len(alphabet)):
            for c in range(b,len(alphabet)):
                combos.append(alphabet[a] + alphabet[b] + alphabet[c])
elif length == 4:
    for a in range(len(alphabet)):
        for b in range(a,len(alphabet)):
            for c in range(b,len(alphabet)):
                for d in range(c,len(alphabet)):
                    combos.append(alphabet[a] + alphabet[b] + alphabet[c] + alphabet[d])

Is there any easier way than just covering a bunch of possible values of length?

LordHadron
  • 53
  • 4
  • 3
    Can you say more about your attempted/failed solution using `itertools.product`? Your way should be far more computationally expensive. – Two-Bit Alchemist Aug 17 '15 at 21:19
  • 1
    @Two-BitAlchemist: no, the OP's code is better, because it generates only the ones he needs. Using product, in the 4-letter case, you'd be throwing away 151145/160000 of the results. – DSM Aug 17 '15 at 21:35

2 Answers2

3

IIUC, you can simply use itertools.combinations_with_replacement.

>>> list(map(''.join, combinations_with_replacement(["a","b","c"],2)))
['aa', 'ab', 'ac', 'bb', 'bc', 'cc']
>>> list(map(''.join, combinations_with_replacement(["a","b","c"],3)))
['aaa', 'aab', 'aac', 'abb', 'abc', 'acc', 'bbb', 'bbc', 'bcc', 'ccc']
>>> list(map(''.join, combinations_with_replacement(alphabet,4))) == orig(alphabet)
True

(where orig is simply your original code wrapped into a function).

DSM
  • 342,061
  • 65
  • 592
  • 494
  • was just going to answer the same, product would give completely different output – Padraic Cunningham Aug 17 '15 at 21:26
  • @PadraicCunningham: then I'm not sure I understand your comment question to the OP -- the OP's code is more efficient than product because he generates only the ones he needs, rather than looking at all of them and filtering away the ones he doesn't want. – DSM Aug 17 '15 at 21:27
  • I did not look at the OP's code initially, I presumed they wanted the product considering they were using product – Padraic Cunningham Aug 17 '15 at 21:29
  • 1
    I'm afraid I did the same thing. I made assumptions based on the question text and not the code. – Two-Bit Alchemist Aug 17 '15 at 21:36
  • This is exactly what I was looking for, thank you. I timed the different implementations to see how they compared with strings of length 6: itertools.product(): 72.5235 s, manual nest: 0.1549 s, itertools.combinations_with_replacement(): 0.1536 s – LordHadron Aug 18 '15 at 02:28
1
  1. the code for itertools.product does exactly what you want and is much more efficient that nested loops

  2. i suspect that what you really want is itertools.combinations_with_replacement

yurib
  • 8,043
  • 3
  • 30
  • 55