Generate all unique k-subsequences

Question

I am trying to write a Python (at least initially) function to generate all subsequences of some length k (where k > 0). Since I only need unique subsequences, I am storing both the subsequences and partial subsequences in sets. The following, adapted from a colleague, is the best I could come up with. It seems...overly complex...and like I should be able to abuse itertools, or recursion, to do what I want to do. Can anyone do better?

from typing import Set, Tuple


def subsequences(string: str, k: int) -> Set[Tuple[str, ...]]:
    if len(string) < k:
        return set()
    start = tuple(string[:k])
    result = {start}
    prev_state = [start]
    curr_state = set()
    for s in string[k:]:
        for p in prev_state:
            for i in range(k):
                new = p[:i] + p[i + 1 :] + (s,)
                curr_state.add(new)
        result.update(curr_state)
        prev_state = list(curr_state)
        curr_state.clear()
    return result

(For context, I am interested in induction of k-strictly piecewise languages, an efficiently learnable subclass of the regular languages, and the grammar can be characterized by all licit k-subsequences.

Ultimately I am also thinking about doing this in C++, where std::make_tuple isn't quite as powerful as Python tuple.)

Can you use https://docs.python.org/3/library/itertools.html#itertools.combinations? — SyntaxVoid, Oct 22 '19 at 18:16
@DanielMesejo: All subsequences of some iterable (here typed to be a string). The 2-subsequences of `"foo"` are `{("f", "o"), ("o", "o")}` and the 3-subsequences of `"foobar"` are `{('o', 'b', 'r'), ('o', 'o', 'a'), ('f', 'b', 'a'), ('o', 'a', 'r'), ('f', 'o', 'o'), ('f', 'o', 'r'), ('b', 'a', 'r'), ('o', 'b', 'a'), ('f', 'a', 'r'), ('o', 'o', 'r'), ('f', 'b', 'r'), ('o', 'o', 'b'), ('f', 'o', 'b'), ('f', 'o', 'a')}` — structural ambiguity, Oct 22 '19 at 18:37
@SyntaxVoid I suppose I could just do `frozenset(itertools.combinations(string, k))` but I wonder if it makes sense to avoid the intermediate computations. — structural ambiguity, Oct 22 '19 at 19:06

pylang · Accepted Answer · 2019-10-22T19:24:31.373

You want a set of r combinations from n items (w/o replacement, <= (n choose r).

Given

import itertools as it

import more_itertools as mit

Code

Option 1 - itertools.combinations

set(it.combinations("foo", 2))
# {('f', 'o'), ('o', 'o')}

set(it.combinations("foobar", 3))
# {('b', 'a', 'r'),
#  ('f', 'a', 'r'),
#  ('f', 'b', 'a'),
#  ('f', 'b', 'r'),
#  ('f', 'o', 'a'),
#  ('f', 'o', 'b'),
#  ('f', 'o', 'o'),
#  ('f', 'o', 'r'),
#  ('o', 'a', 'r'),
#  ('o', 'b', 'a'),
#  ('o', 'b', 'r'),
#  ('o', 'o', 'a'),
#  ('o', 'o', 'b'),
#  ('o', 'o', 'r')}

Option 2 - more_itertools.distinct_combinations

list(mit.distinct_combinations("foo", 2))
# [('f', 'o'), ('o', 'o')]

list(mit.distinct_combinations("foobar", 3))
# [('f', 'o', 'o'),
#  ('f', 'o', 'b'),
#  ('f', 'o', 'a'),
#  ('f', 'o', 'r'),
#  ('f', 'b', 'a'),
#  ('f', 'b', 'r'),
#  ('f', 'a', 'r'),
#  ('o', 'o', 'b'),
#  ('o', 'o', 'a'),
#  ('o', 'o', 'r'),
#  ('o', 'b', 'a'),
#  ('o', 'b', 'r'),
#  ('o', 'a', 'r'),
#  ('b', 'a', 'r')]

Both options yield the same (unordered) output. However:

Option 1 takes the set of all combinations (including duplicates)
Option 2 does not compute duplicate intermediates

Install more_itertools via > pip install more_itertools.

See also a rough implementation of itertools.combinations in written Python.

Generate all unique k-subsequences

1 Answers1