3

I know the LCS problem need time ~ O(mn) where m and n are length of two sequence X and Y respectively. But my problem is a little bit easier so I expect a faster algorithm than ~O(mn).

Here is my problem:

Input:

a positive integer Q, two sequence X=x1,x2,x3.....xn and Y=y1,y2,y3...yn, both of length n.

Output:

True, if the length of the LCS of X and Y is at least n - Q;

False, otherwise.

The well-known algorithm costs O(n^2) here, but actually we can do better than that. Because whenever we eliminate as many as Q elements in either sequence without finding a common element, the result returns False. Someone said there should be an algorithm as good as O(Q*n), but I cannot figure out.

UPDATE: Already found an answer!

I was told I can just calculate the diagonal block of the table c[i,j], because if |i-j|>Q, means there are already more than Q unmatched elements in both sequences. So we only need to calculate the c[i,j] when |i-j|<=Q.

Valar Morghulis
  • 647
  • 5
  • 16
  • So in essence you just need to find a common subsequence of length n-Q, and so we know that it must start in one of the first Q+1 positions. That's bound to help. – Ulrich Schwarz Oct 14 '14 at 18:33

2 Answers2

0

Here is one possible way to do it:
1. Let's assume that f(prefix_len, deleted_cnt) is the leftmost position in Y such that prefix_len elements of X were already processed and exactly deleted_cnt of them were deleted. Obviously, there are only O(N * Q) states because deleted_cnt cannot exceed Q.
2. The base case is f(0, 0) = 0(nothing was processed, thus nothing was deleted).
3. Transitions:
a) Remove the current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j)).
b) Match the current element with the leftmost possible element from Y that is equal to it and located after f(i, j)(let's assume that it has index pos): f(i + 1, j) = min(f(i + 1, j), pos).
4. So the only question remaining is how to get the leftmost matching element located to the right from a given position. Let's precompute the following pairs: (position in Y, element of X) -> the leftmost occurrence of the element of Y equal to this element of X to the right from this position in Y and put them into a hash table. It looks like O(n^2). But is not. For a fixed position in Y, we never need to go further to the right from it than by Q + 1 positions. Why? If we go further, we skip more than Q elements! So we can use this fact to examine only O(N * Q) pairs and get desired time complexity. When we have this hash table, finding pos during the step 3 is just one hash table lookup. Here is a pseudo code for this step:

map = EmptyHashMap()
for i = 0 ... n - 1:
    for j = i + 1 ... min(n - 1, i + q + 1)
        map[(i, Y[j])] = min(map[(i, Y[j])], j)

Unfortunately, this solution uses hash tables so it has O(N * Q) time complexity on average, not in the worst case, but it should be feasible.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
  • Thanks! this method is great. However, I was told I can just calculate the diagonal block of the table c[i,j], because if |i-j|>Q, means there are already more than Q unmatched elements in both sequences. So we only need to calculate the c[i,j] when |i-j|<=Q. – Valar Morghulis Oct 14 '14 at 19:17
  • @ValarMorghulis Oh, yeah. It is really simple! My solution looks like an overkill. – kraskevich Oct 14 '14 at 19:20
0

You can also say cost of the process to make the string equal must not be greater than Q.if it greater than Q than answer must be false.(EDIT DISTANCE PROBLEM)

Suppose of the of string x is m, and the size of string y is n, then we create a two dimensional array d[0..m][0..n], where d[i][j] denotes the edit distance between the i-length prefix of x and j-length prefix of y.

The computation of array d is done using dynamic programming, which uses the following recurrence:

d[i][0] = i , for i <= m
d[0][j] = j , for j <= n

d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.

answer of LCS if m>n, m-dp[m][m-n]

ayusha
  • 474
  • 4
  • 12