Of course! There are so few possible operations with so many 0s. I mean, the answer is at most 200.
Take a look at
10001010000000001
vs ||||||
10111010100000010
Look at all the zeroes with pipes. Does it matter which one out of those you end up deleting? Nope. That's the key.
Solution 1
Let's consider the normal n*m solution:
dp(int i, int j) {
// memo & base case
if( str1[i-1] == str1[j-1] ) {
return dp(i-1, j-1);
}
return 1 + min( dp(i-1, j), dp(i-1, j-1), dp(i, j-1) );
}
If almost every single character was a 0, what would hog the most amount of time?
if( str1[i-1] == str1[j-1] ) { // They will be equal so many times, (99900)^2 times!
return dp(i-1, j-1);
}
I could imagine that trickling down for tens of thousands of recursions. All you actually need logic for are the ~200 critical points. You can ignore the rest. A simple modification would be
if( str1[i-1] == str1[j-1] ) {
if( str1[i-1] == 1 )
return dp(i-1, j-1); // Already hit a critical point
// rightmost location of a 1 in str1 or str2, that is <= i-1
best = binarySearch(CriticalPoints, i-1);
return dp(best + 1, best + 1); // Use that critical point
// Important! best+1 because we still want to compute the answer at best
// Without it, we would skip over in a case where str1[best] is 1, and str2[best] is 0.
}
CriticalPoints would be the array containing the index of every 1 in either str1 or str2. Make sure that it's sorted before you binary search. Keep in mind those gochya's. My logic was: Okay I need to make sure to calculate the answer at the index best
itself, so let's go with best + 1
as the parameter. But, if best == i - 1
, we get stuck in a loop. I'll handle that with a quick str1[i-1] == 1
check. Done, phew.
You can do a quick check for correctness by noting that at worst case you will hit all 200*100000 combinations of i and j that make critical points, and when those critical points call min(a, b, c)
, it only makes three recursive function calls. If any of those functions are critical points, then it's part of those 200*100000 we already counted and we can ignore it. If it's not, then in O(log(200)) it falls into a single call on another critical point (Now, it's something we know is part of the 200*100000 we already counted). Thus, each critical point takes at worst 3*log(200)
time excluding calls to other critical points. Similarly, the very first function call will fall into a critical point in log(200)
time. Thus, we have an upper bound of O(200*100000*3*log(200) + log(200)).
Also, make sure your memo table is a hashmap, not an array. 10^10 memory will not fit on your computer.
Solution 2
You know the answer is at most 200, so just prevent yourself from computing more than that many operations deep.
dp(int i, int j) { // O(100000 * 205), sounds good to me.
if( abs(i - j) > 205 )
return 205; // The answer in this case is at least 205, so it's irrelevant to calculating the answer because when min is called, it wont be smallest.
// memo & base case
if( str1[i-1] == str1[j-1] ) {
return dp(i-1, j-1);
}
return 1 + min( dp(i-1, j), dp(i-1, j-1), dp(i, j-1) );
}
This one is very simple, but I leave it for solution two because this solution seems to have come out from thin air, as opposed to analyzing the problem and figuring out where the slow part is and how to optimize it. Keep this in your toolbox though, since you should be coding this solution.
Solution 3
Just like Solution 2, we could have done it like this:
dp(int i, int j, int threshold = 205) {
if( threshold == 0 )
return 205;
// memo & base case
if( str1[i-1] == str1[j-1] ) {
return dp(i-1, j-1);
}
return 1 + min( dp(i-1, j, threshold - 1), dp(i-1, j-1, threshold - 1), dp(i, j-1, threshold - 1) );
}
You might be worried about dp(i-1, j-1) trickling down, but the threshold keeps i and j close together so it calculates a subset of Solution 2. This is because the threshold gets decremented every time i and j get farther apart. dp(i-1, j-1, threshold)
would make it identical to Solution 2 (Thus, this one is slightly faster).
Space
These solutions will give you the answer very quickly, but if you want a space-optimizing solution as well, it would be easy to replace str1[i]
with (i in Str1CriticalPoints) ? 1 : 0
, using a hashmap. This would give a final solution that is still very fast (Though will be 10x slower), and also avoids keeping the long strings in memory (To the point where it could run on an Arduino). I don't think this is necessary though.
Note that the original solution does not use 10^10 space. You mention "even better, less than 10^10 space", with an implication that 10^10 space would be acceptable. Unfortunately, even with enough RAM, iterating though that space takes 10^10 time, which is definitely not acceptable. None of my solutions use 10^10 space: only 2 * 10^5 to hold the strings - which can be avoided as discussed above. 10^10 Bytes it 10 GB for reference.
EDIT: As maniek notes, you only need to check abs(i - j) > 105
, as the remaining 100 insertions needed to equate i
and j
will pull the number of operations above 200.