Wrong solution for distinct Sub-sequences of a sequence

Question

I am trying to solve this problem. Problem is: Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example: S = "rabbbit", T = "rabbit"

Answer should be 3.

class permute:
    def __init__(self):
        self.count = 0

    def count_permute(self, s, t, i, j):
        print(i, j)
        if t in s:
            self.count += 1
            return
        if i >= len(s) or j >= len(t):
            return
        self.count_permute(s, t, i+1, j)
        if s[i] == t[j] and j == (len(t)-1):
            self.count += 1
            self.count_permute(s, t, i+1, j+1)
        else:
            self.count_permute(s, t, i+1, j+1)

    def print_count(self):
        print(self.count)

p = permute()
p.count_permute("rabbbit", "rabbit", 0, 0)
p.print_count()

There is also dynamic programming solution which I know by creating a matrix. However I wanted to know where I am going wrong with this recursive approach? Currently it is printing 6 but answer should be 3.

Very oddly worded problem. Does it mean the # of ways that T is a subsequence of S? Otherwise how is it 3? — Jason S, Feb 24 '16 at 23:21
@JasonS: rabbit can be made by ra[bb]bit or rab[bb]it or ra[b]b[b]it. [] is used to show selected character.Got it? — noman pouigt, Feb 24 '16 at 23:23
Yeah, figured it was that. Not what "subsequences of T" indicates though. — Jason S, Feb 24 '16 at 23:24
Is it about [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance)? — Alex Belyaev, Feb 24 '16 at 23:31
You haven't solved the problem... all you are counting is the number of times your recursion reaches the end of both strings. — Jason S, Feb 24 '16 at 23:39
@nomanpouigt I noticed that but you compare `if t in s` where `t` in `s` are always the same, you do not use `i` and `j` to construct new substring based on `t`, `i`, and `j`. So, as stated above, you simply count recursion calls. — teamnorge, Feb 24 '16 at 23:45
@JasonS yes but only when you have found sub-sequence in the given string. By the way did you down voted the question? — noman pouigt, Feb 24 '16 at 23:46
I think the problem description (on the linked page) is very badly worded. It is actually asking for the number of subsequences of `S` that are equal to `T`, which is a very different thing than the number of subsequences of `T` that are "in" `S`. Very badly worded by the author of that challenge! (I know this is not the fault of the questioner.) — Blckknght, Feb 25 '16 at 00:24
I don't believe that the answer "3", given, is correct. Each distinct letter would be a subsequence (delete all letters but one), and there are more than 3 distinct single letters alone. — aghast, Feb 25 '16 at 00:32
@PaulHankin: I understood the problem. I have to re-factor the code as the current code will not work. — noman pouigt, Feb 25 '16 at 05:22

score 0 · Answer 1 · answered Feb 25 '16 at 00:29

The program fails because its logic does not seem particularly connected to a algorithm that would solve the problem. I have some trouble figuring out what you intended, given the one-letter variable names and the lack of comments.

I do understand the statement if i >= len(s) or j >= len(t): return -- if we've run off the end of either string, quit looking. Similarly, the last half of the function can work, given proper support, but it's not enough to do its job in this context.

I don't get the first if action: you find t as a wholesale substring of s. Counting one subsequence is correct, but then you quit looking. How do you plan to get past 1 given ("abababab", "ab")? You find the first, count 1, and then quit altogether. Similarly, it fails when handed your one test case, counting all too many cases.

I also tested this with ("bababababab", "aa"); it claims 25 solutions, counting ghosts and fumbling the index advances. If I remove the 'b' characteres on the two ends, it counts 20. This should be a clue as to your terminal counting.

For ("aacbaaa", "aba"), it counts 31 occurrences instead of the actual 6.

In the current code, you need to keep looking after you find an initial match. You need both of your recursive calls here (using j and j+1). You also need to restrict those end-of-search over-counts.

Finally, I strongly recommend some basic, brute-force debugging: stuff in print statements to track the progress of the routine. Track entry and exit, parameter values, and return conditions. I inserted the following tracing statement to help find my way:

def count_permute(self, s, t, i, j):
    print "ENTER", \
        "\ts=", s[i:], "; i=", i,\
        "\tt=", t[j:], "; j=", j

This is only a beginning.

If you don't find a quick solution, I recommend that you go back to the pseudo-code from which you wrote the above. Desk-simulate the algorithm on a couple of simple cases, such as the ones I supplied. Make sure that you have your basic recursion steps cleanly handled:

Termination condition
Detection cases
Recursive calls

I think you have a decent handle on the last one, and you're almost there on #2. You're not that far from a working solution; keep plugging along!

Wrong solution for distinct Sub-sequences of a sequence

1 Answers1