2

Main DNA sequence(a string) is given (let say string1) and another string to search for(let say string2). You have to find the minimum length window in string1 where string2 is subsequence.
string1 = "abcdefababaef"
string2 = "abf"

Approaches that i thought of, but does not seem to be working:
1. Use longest common subsequence(LCS) approach and check if the (length of LCS = length of string2). But this will give me whether string2 is present in string1 as subsequence, but not smallest window.
2. KMP algo, but not sure how to modify it.
3. Prepare a map of {characters: pos of characters} of string1 which are in string2. Like: { a : 0,6,8,10
b : 1,7,9
f : 5,12 }
And then some approach to find min window and still maintaining the order of "abf"

I am not sure whether I am thinking in right directions or am I totally off.
Is there a known algorithm for this, or does anyone know any approach? Kindly suggest.
Thanks in advance.

Shweta
  • 1,111
  • 3
  • 15
  • 30

3 Answers3

0

You can do LCS and find all the max subsequences in the String1 of String2 using recursion on the DP table of the LCS result. Then calculate the window length of each of LCS and you can get minimum of it. You can also stop a branch if it already exceeds size of current smallest window found.

check Reading out all LCS :-

http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
0

Dynamic Programming! Here is a C implementation

#include <iostream>
#include <vector>

using namespace std;

int main() {
    string a, b;
    cin >> a >> b;

    int m = a.size(), n = b.size();
    int inf = 100000000;

    vector < vector < int > > dp (n + 1, vector < int > (m + 1, inf)); // length of min string a[j...k] such that b[i...] is a subsequence of a[j...k]
    dp[n] = vector < int > (m + 1, 0); // b[n...] = "", so dp[n][i] = 0 for each i

    for (int i = n - 1; i >= 0; --i) {
        for (int j = m - 1; j >= 0; --j) {
            if(b[i] == a[j])    dp[i][j] = 1 + dp[i+1][j+1];
            else                dp[i][j] = 1 + dp[i][j+1];
        }
    }

    int l, r, min_len = inf;

    for (int i = 0; i < m; ++i) {
        if(dp[0][i] < min_len) {
            min_len = dp[0][i];
            l = i, r = i + min_len;
        }
    }

    if(min_len == inf) {
        cout << "no solution!\n";
    } else {
        for (int i = l; i < r; ++i) {
            cout << a[i];
        }
        cout << '\n';
    }

    return 0;
}
Corei13
  • 401
  • 2
  • 9
0

I found a similar interview question on CareerCup , only difference being that its an array of integers instead of characters. I borrowed an idea and made a few changes, let me know if you have any questions after reading this C++ code.

What I am trying to do here is : The for loop in the main function is used to loop over all elements of the given array and find positions where I encounter the first element of the subarray, once found, I call the find_subsequence function where I recursively match the elements of the given array to the subarray at the same time preserving the order of elements. Finally, find_subsequence returns the position and I calculate the size of the subsequence.

Please excuse my English, wish I could explain it better.

#include "stdafx.h"
#include "iostream"
#include "vector"
#include "set"
using namespace std;
class Solution {
public:
   int find_subsequence(vector<int> s, vector<int> c, int arrayStart, int subArrayStart) {
    if (arrayStart == s.size() || subArrayStart ==c.size()) return -1;
    if (subArrayStart==c.size()-1) return arrayStart;

    if (s[arrayStart + 1] == c[subArrayStart + 1])
        return find_subsequence(s, c, arrayStart + 1, subArrayStart + 1);
    else
        return find_subsequence(s, c, arrayStart + 1, subArrayStart);
   }
};

int main()
{
vector<int> v = { 1,5,3,5,6,7,8,5,6,8,7,8,0,7 };
vector<int> c = { 5,6,8,7 };
Solution s;
int size = INT_MAX;
int j = -1;
for (int i = 0; i <v.size(); i++) {
    if(v[i]==c[0]){
        int x = s.find_subsequence(v, c, i-1, -1);
        if (x > -1) {
            if (x - i + 1 < size) {
                size = x - i + 1;
                j = i;
            }
            if (size == c.size())
                break;
        }
    }
}
cout << size <<"  "<<j;
return 0;
}
Sujith Shivaprakash
  • 161
  • 1
  • 2
  • 14