2

I'm trying to create an iterator/generator of all variable length strings given an alphabet and a maximum string length, sorted in lexicographic order.

Currently, I have a naive method that uses nested itertools product(), then proceeds to sort. This works great for small max_len_string, but for my target usage (around max_len_string=32) this uses far too much temporary storage to be practical.

Is there a way to make this algorithm use only a small amount of constant space each iteration instead of slurping the entire sequence in sorting?

from itertools import product
def variable_strings_complete(max_len_string, alphabet=range(2)):
    yield from sorted(string
                      for i in range(1, max_len_string+1)
                      for string in product(alphabet, repeat=i))

list(variable_strings_complete(3))

[(0,),
 (0, 0),
 (0, 0, 0),
 (0, 0, 1),
 (0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1,),
 (1, 0),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1),
 (1, 1, 0),
 (1, 1, 1)]
Pang
  • 9,564
  • 146
  • 81
  • 122

2 Answers2

1

Working with itertools early in the morning is a recipe for disaster, but something like

from itertools import product, takewhile
def new(max_len_string, alphabet=range(2)):
    alphabet = list(alphabet)
    zero = alphabet[0]
    for p in product(alphabet, repeat=max_len_string):
        right_zeros = sum(1 for _ in takewhile(lambda x: x==zero, reversed(p)))
        base = p[:-right_zeros]
        yield from filter(None, (base+(zero,)*i for i in range(right_zeros)))
        yield p

should work:

>>> list(new(3)) == list(variable_strings_complete(3))
True
>>> list(new(20)) == list(variable_strings_complete(20))
True
>>> list(new(10, alphabet=range(4))) == list(variable_strings_complete(10, range(4)))
True

This assumes the alphabet is passed in the canonical order; list can be replaced with sorted if that's not the case.

DSM
  • 342,061
  • 65
  • 592
  • 494
0

This seems to work (EDIT -- fixed it to be a generator):

from itertools import chain

def variable_strings_complete(max_len, alphabet=range(2)):
    alphabet = sorted(map(str, alphabet))

    def complete_partial(partial, alph_idx):
        to_returns = (partial + a for a in alphabet)

        if alph_idx == (max_len - 1):
            yield from to_returns
        else:
            for r in to_returns:
                n = complete_partial(r, alph_idx + 1)
                yield from chain([r], n)

    yield from complete_partial("", 0)

print(list(variable_strings_complete(3)))

Returns:

['0', '00', '000', '001', '01', '010', '011', '1', '10', '100', '101', '11', '110', '111']

And it works for other alphabets:

print(list(variable_strings_complete(3, "ab")))

yields

['a', 'aa', 'aaa', 'aab', 'ab', 'aba', 'abb', 'b', 'ba', 'baa', 'bab', 'bb', 'bba', 'bbb']
Patrick Collins
  • 10,306
  • 5
  • 30
  • 69