0

I was reading 'Algorithms Unlocked' by Thomas H Corman and came across an approach to finding the Longest Common Subsequence which ran in O(n * m). This approach can be seen in this article too: https://www.geeksforgeeks.org/printing-longest-common-subsequence/

This seems to be the most popular and quickest approach but seemed a bit convoluted to me so I tried to write a different approach, also using dynamic programming.

My approach is recursive, in the worst case has to branch twice every call but uses memoization to save the result of previous inputs. I understand that std::map can take lg(n) time for 'find' but I assume this code can be easily altered to use a hash map instead

using namespace std;

string lcs_helper(int i, int j, const string& x, const string& y, map<pair<int, int>, string>& memoizedSolutions)
{
    if (memoizedSolutions.count({ i, j })) return memoizedSolutions[{i, j}]; 

    if (i == 0 || j == 0) return "";

    if (x[i] == y[j])
    {
        string lcs = lcs_helper(i - 1, j - 1, x, y, memoizedSolutions);
        lcs.push_back(x[i]);
        memoizedSolutions[{i, j}] = lcs;

        return lcs;
    }

    string lcs1 = lcs_helper(i - 1, j, x, y, memoizedSolutions);
    string lcs2 = lcs_helper(i, j - 1, x, y, memoizedSolutions);

    if (lcs1.size() >= lcs2.size()) // corrected bug
    {
        memoizedSolutions[{i, j}] = lcs1;
        return lcs1;
    }

    memoizedSolutions[{i, j}] = lcs2;
    return lcs2;
}

string lcs(string x, string y)
{
    map<pair<int, int>, string> memoizedSolutions;

    return lcs_helper(x.size() - 1, y.size() - 1, x, y, memoizedSolutions); 
}

I am struggling to generalise the time complexity of my approach and wanted to know if and why it is slower than the usual approach.

I tried to understand the time complxity through creating a tree with an example but am still struggling to generalise it. https://i.stack.imgur.com/6wg3p.jpg (This was the tree example I created, red= return "" and the other colours represent inputs that can be memoized, I have ommited the trees for memoized results)

60q
  • 35
  • 5

1 Answers1

1

Even if you use a hashmap for memoization, and correct the bug, the time complexity is O(m*n*l), where l is the length of the LCS; this is also upper-bounded by O(m*n*min(m, n)).

First, the bug. lcs_helper returns a string:

string lcs1 = lcs_helper(i - 1, j, x, y, memoizedSolutions);
string lcs2 = lcs_helper(i, j - 1, x, y, memoizedSolutions);

if (lcs1 >= lcs2)

This if statement should compare the lengths of the strings; it actually compares them lexicographically.


To find the time-complexity of a dynamic-programming algorithm, a handy formula is (number of subproblems) * (time to solve one subproblem).
Here, you have the same number of subproblems as the regular DP solution: m*n, or one for each value of i and j.

The time to solve a subproblem, in your case, is not O(1). Since your function returns a string, and must cache strings for memoization, it will do up to length(LCS) work per subproblem to build the string and copy the memory.

This may be why the normal DP algorithm can look convoluted: if you store the actual LCS strings, the run-time will become cubic in the worst-case. By only memoizing on the lengths of the LCS, and having lcs_helper return a length, you'll need to backtrack at the end to construct the string, but maintain an O(m*n) runtime.

kcsquared
  • 5,244
  • 1
  • 11
  • 36
  • 'The time to solve a subproblem, in your case, is not O(1). Since your function returns a string, and must cache strings for memoization, it will do up to length(LCS) work per subproblem to build the string and copy the memory.' Could I get some more clarification for this, why does returning the string take length(LCS)? Furthermore, assuming inserting into the hashmap is constant, what if I used a string reference instead and at each subproblem built up my LCS. Would this achieve O(m * n)? – 60q Feb 21 '22 at 23:37
  • @60q If inserting into the map is constant, and you returned a reference, then yes, it would be `O(m*n)` (also, assuming you use unordered_map instead of map). Maybe I'm not enough of a C++ expert, but I don't see how you could do constant-time inserts that easily. Who would own the string that your hashmap values point to? – kcsquared Feb 21 '22 at 23:52
  • @60q If there is some kind of return value optimization happening, returning the LCS could possibly take less than the length of that string. I was assuming that different calls of the lcs_helper function would have to copy the string, since it's being both cached and modified. – kcsquared Feb 21 '22 at 23:55
  • The string would be created in the lcs function as opposed to lcs_helper and its reference would be passed to lcs_helper as an argument. lcs_helper would then append to this string reference, eliminating the need for returns. However, I think I understand your point ,in that to cache the string, a copy of the string must be made and the time taken for this copy would increase with the size of the string; and not be constant like I assumed. – 60q Feb 22 '22 at 00:14