4

I know how to use dynamic programming to solve the problem of finding either the most longest common subsequence or longest common substring given two strings. However, I am having a hard time to come up a solution for the problem of finding the longest subsequence of string X which is a substring of string Y.

Here is my brute force solution:

  1. find all the subsequences of string X and sort them by length desc;
  2. iterate through the sorted subsequnces, if current subsequence is a substring of Y, return the subsequence.

It works but the running time could be bad. Suppose all characters in X are unique, then there are 2^m subsequnces, where m is the length of X. I think checking if a string is a substring of Y takes O(n), where n is length of Y. So the overall running time is O(n*2^m).

Is a better way to do this, possibly via DP?

Edit:

Here is an example what I want to solve:

Y: 'BACDBDCD'
X: 'ABCD'

The answer would be 'ACD', because 'ACD' is the longest subsequence of X which is also a substring of Y.

Skiptomylu
  • 964
  • 1
  • 13
  • 21
  • 1
    why don't you want to use DP agorithm when you know it has the best asymptotic complexity? – Carmine Ingaldi Oct 09 '14 at 18:09
  • 2
    Why the downvotes? He says he wants to use DP. – Juan Lopes Oct 09 '14 at 18:11
  • have you looked at http://en.wikipedia.org/wiki/Longest_common_subsequence_problem ... which very clearly states that dynamic programming is the correct solution to solve in polynomial time ... your first statement says you already know how to do this ... so whats the problem (I didnt downvote) [edit perhaps I misread your opening comment ... you say you know how to find most common not longest] – Joran Beasley Oct 09 '14 at 18:14
  • @sscnapoli1926 DP is the solution I am looking for, but I don't know how to do it. Maybe my wording was confusing... – Skiptomylu Oct 10 '14 at 01:24
  • @JoranBeasley I have read the article but I think my problem is different. Please see my edit. – Skiptomylu Oct 10 '14 at 01:32
  • Oh I see ... that is a little more complicated ... a subsequence and substring are pretty interchangeable to me ... sorry I misunderstood – Joran Beasley Oct 10 '14 at 15:29

3 Answers3

1

Here are two ways to do it(both of them have polynomial time complexity).
1. Generate all substrings of Y(there are O(m^2) such substrings). For each substring, check if it is a subsequence of X(it can be done in linear time using greedy algorithm). This algorithm has O(n * m^2) time complexity, which is already not that bad.
2. If it is not fast enough, it is possible to achieve O(n * m) time complexity using dynamic programming. Let's define f(i, j) = the longest answer that ends in the i-th position in X and the j-th position in Y. The transitions are the following:

f(i + 1, j) = max(f(i + 1, j), f(i, j)) //skip this character in X
if X[i] == Y[j] //add this character to current answer
    f(i + 1, j + 1) = max(f(i + 1, j + 1), f(i, j) + 1)  

The initial value for f is 0 for all valid i and j. The answer is the largest value among f(n, j) for all valid j.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
  • The first one is exactly what I did. For the second one, I don't get your code, how do you compute f(i+1, j) exactly? – Skiptomylu Oct 10 '14 at 01:45
  • @ChuntaoLu The first one is not what you did: you found all subsequences of X(there can be exponential number of them), but I found all substrings of Y(there is always polynomial number of substrings). The second one: I just iterate over all possible i and j in increasing order and apply two formulas for f. – kraskevich Oct 10 '14 at 03:21
  • sorry, you are right, your first solution is different, and I can see the improvement. For the second one, f(i + 1, j) appears both sides of the assignment, how does that work?? – Skiptomylu Oct 10 '14 at 12:42
  • @ChuntaoLu Initially, `f` is `0` for all `i` and `j`. Then it is updated according to the formula(if the new value is greater). It can be rewritten as `if f(i + 1, j) < f(i, j) then f(i + 1, j) = f(i, j)`, so there is no problem here. – kraskevich Oct 10 '14 at 12:47
  • This answer is hard to follow because it does not present a recurrence in standard form. We should define a table entry T[i,j] in terms of entries that have previously been completed. – Alex Leibowitz Apr 26 '23 at 23:20
0

In Python you don't need Dynamic Programming for solving it. Use the flexibility of modifying for loop syntax in run time to achieve it:

current_len=0
subseq_len = 0
subseq_data=''
array1 = "ABCBDAB"
array2 = "BDCABA"
#array1="MICHAELANGELO"
#array2="HELLO"
m=len(array1)
n=len(array2)
#loop over first string array1 
#and increment index k to form new substrings of len-1
for k in range(0,m):
    start=0
    current_len = 0
    cur_seq =''
    #substring starting at k to m of array1
    for i in range(k,m):
        for j in range(start,n):
            if array1[i]==array2[j]:
                #increment length of matched subsequence
                current_len +=1
                #move forward index to point to remaining sub string array2
                start=j+1
                cur_seq = cur_seq+array1[i]
                break
        #print(k)
        #print(":"+cur_seq)
    #Check if current iteration for k produced longer match
    if subseq_len < current_len:
        subseq_len = current_len
        subseq_data = cur_seq
    enter code here

print(subseq_data)
print(subseq_len)
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
0

Here is the link to its solution on GFG Find length of longest subsequence of one string which is substring of another string

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/31293829) – Muhammad Mohsin Khan Mar 17 '22 at 11:03