C++ Longest Common Subsequence Implementation errors O(n*m)

Question

I'm going through some dynamic programming articles on geeksforgeeks and ran across the Longest Common Subsequence problem. I did not come up with an implementation of the exponential naive solution on my own, however after working out some examples of the problem on paper I came up with what I thought was a successful implementation of an O(n*m) version . However, an OJ proved me wrong. My algorithm fails with the input strings:

"LRBBMQBHCDARZOWKKYHIDDQSCDXRJMOWFRXSJYBLDBEFSARCBYNECDYGGXXPKLORELLNMPAPQFWKHOPKMCO"
"QHNWNKUEWHSQMGBBUQCLJJIVSWMDKQTBXIXMVTRRBLJPTNSNFWZQFJMAFADRRWSOFSBCNUVQHFFBSAQXWPQCAC"

My thought process for the algorithm is as follows. I want to maintain a DP array whose length is the length of string a where a is the smaller of the input strings. dpA[i] would be the Longest Common Subsequence ending in a[i]. To do this I need to iterate through string a from index 0 => length-1 and see if a[i] exists in b. If a[i] exists in b it will be at position pos.

First mark dp[i] as 1 if dp[i] was 0
To know that a[i] is an extension of an existing subsequence we must go through a and find the first character behind i that matches a value in b behind pos. Let's call the indices of these matching values j and k respectively. This value is guaranteed to be a value we've seen before since we've covered all of a[0...i-1] and have filled out dpA[0...i-1]. When we find the first match, dpA[i] = dpA[j]+1 because we're extending the previous subsequence that ends in a[j]. Rinse repeat.

Obviously this method is not perfect or I wouldn't be asking this question, but I can't quite seem to see the problem with the algorithm. I've been looking at it so long I can hardly think about it anymore but any ideas on how to fix it would be greatly appreciated!

int longestCommonSubsequenceString(const string& x, const string& y) {
  string a = (x.length() < y.length()) ? x : y;
  string b = (x.length() >= y.length()) ? x : y;

  vector<int> dpA(a.length(), 0);

  int pos;
  bool breakFlag = false;

  for (int i = 0; i < a.length(); ++i) {

    pos = b.find_last_of(a[i]);

    if (pos != string::npos) {

      if (!dpA[i]) dpA[i] = 1;

      for (int j = i-1; j >= 0; --j) {
        for (int k = pos-1; k >= 0; --k) {
          if (a[j] == b[k]) {
            dpA[i] = dpA[j]+1;
            breakFlag = true;
            break;
          }
          if (breakFlag) break;
        }
      }
    }

    breakFlag = false;
  }

  return *max_element(dpA.begin(), dpA.end());
}

EDIT

I think the complexity might actually be O(n*n*m)

Your algorithm fails for `aba` and `bab`, so step through this in the debugger. It "thinks" the length of the LCS ending in the last `b` in `bab` is 3. — svinja, Jul 11 '16 at 15:52
@svinja Gotcha..Yeah I think this method may not be possible because you can't really specify which `b` it should find (for that example). It seems you can either find the first or last occurrence of it and that doesn't work so well. I also tried `pos = b.find(a[i], i)` which failed as well — Dominic Farolino, Jul 11 '16 at 16:48

C++ Longest Common Subsequence Implementation errors O(n*m)

EDIT

0 Answers0