0

I have written this piece of code and it prints all substrings of a given string but I want it to print all the possible subsequences.

from itertools import combinations_with_replacement
s = 'MISSISSIPPI'
lst = []
for i,j in combinations_with_replacement(range(len(s)), 2):
        print(s[i:(j+1)])

wjandrea
  • 28,235
  • 9
  • 60
  • 81
challenger
  • 23
  • 1
  • 4
  • It's not clear exactly what you're asking for. Could you give a concrete example for a shorter word? – Dragon Dec 20 '19 at 02:22
  • Why use `combinations_with_replacement` instead of `combinations` if you want subsequences? – kaya3 Dec 20 '19 at 02:50

2 Answers2

6

Use combinations to get subsequences. That's what combinations is for.

from itertools import combinations

def all_subsequences(s):
    out = set()
    for r in range(1, len(s) + 1):
        for c in combinations(s, r):
            out.add(''.join(c))
    return sorted(out)

Example:

>>> all_subsequences('HELLO')
['E', 'EL', 'ELL', 'ELLO', 'ELO', 'EO', 'H', 'HE', 'HEL', 'HELL', 'HELLO', 'HELO',
 'HEO', 'HL', 'HLL', 'HLLO', 'HLO', 'HO', 'L', 'LL', 'LLO', 'LO', 'O']
>>> all_subsequences('WORLD')
['D', 'L', 'LD', 'O', 'OD', 'OL', 'OLD', 'OR', 'ORD', 'ORL', 'ORLD', 'R', 'RD',
 'RL', 'RLD', 'W', 'WD', 'WL', 'WLD', 'WO', 'WOD', 'WOL', 'WOLD', 'WOR', 'WORD',
 'WORL', 'WORLD', 'WR', 'WRD', 'WRL', 'WRLD']
kaya3
  • 47,440
  • 4
  • 68
  • 97
  • I am a bit confused on why the string "HELLO" returns 23 subsequences while "WORLD" returns 31. The math behind subsequences is `2 ** n - 1`, why then would the string "HELLO" with length 5 return 23 subsequences? – Alex Maina Nov 04 '22 at 08:24
  • 1
    @AlexMaina Perhaps it would be instructive for you to write out which 31 subsequences you think "HELLO" should have, and then check whether you really have 31 different ones. – kaya3 Nov 04 '22 at 11:39
0

One simple way to do so is to verify if the list you are making already has the case that you are iterating over. If you have already seen it, then skip it, if not, then append it to your list of seen combinations.

from itertools import combinations_with_replacement

s = 'MISSISSIPPI'
lst = []

for i,j in combinations_with_replacement(range(len(s)), 2):
    if s[i:(j+1)] not in lst:
        lst.append(s[i:(j+1)]) # save new combination into list
        print(lst[-1]) # print new combination

To be sure that all cases are covered, it really helps to make a drawing of combination that the loop will go over. Suppose a generic string, where letters are represented by their position in the python list, for example 0 to 3.

Here are the numbers generated by "combinations_with_replacement"

00, 01, 02, 03,
11, 12, 13,
22, 23,
33

  • 22 is not a subsequence of 0123. Also, if you're only using it for membership tests, use a set, not a list. Set membership tests are O(1) time. – kaya3 Dec 20 '19 at 02:51
  • 1
    Isn't this all substrings? [A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements](https://en.wikipedia.org/wiki/Subsequence) – DarrylG Dec 20 '19 at 02:52
  • 1
    @kaya3--agree. But, the code above produces substrings not subsequences (at least when I tried it). For instance, for sring "cab" the result is c, ca, cab, a, ab, b – DarrylG Dec 20 '19 at 02:56
  • @DarrylG Right, I misunderstood your comment - my apologies. Yes, you're right, this answer produces substrings like the OP's code does; the only difference is the avoidance of duplicates. – kaya3 Dec 20 '19 at 03:03
  • @kaya3--no problem. I liked (upvoted) your answer. – DarrylG Dec 20 '19 at 03:05