In Longest Common sub-sequences problem, we have two string s1 and s2 with length m and n, we want to find the longest sub-sequence in both s1 and s2 which is not required to occupy consecutive positions within the original sequences. The common solution for problem is using dynamic programming that takes O(mn) time complexity, but I want an approximation of LCS in genome sub-sequences which are consists of A,C,G,T as alphabets, I searched a lot about this, I only find some approximation with O(mn log log n/ log^2 n) time. The best approximation I found is here which is a parallel version of LCS that is not appropriate for me.
Asked
Active
Viewed 140 times
0
-
Did you read about Method of Four Russians in Wikipedia? – גלעד ברקן Nov 22 '16 at 19:36
-
1@גלעדברקן As I read, the time complexity of LCS with Four Russians method would be O(N^2/logN), am I wrong about it? – user137927 Nov 22 '16 at 19:50
-
Can you tell us more about what kind of approximation you're looking for? – templatetypedef Nov 22 '16 at 20:25
-
@templatetypedef The order of approximation is important to me, I thought about filling just the diameter of the table of the LCS in dynamic programming, I also thought about using an array to keep the last position of every alphabet in the strings and use it to find LCS, but none of them works out and I couldn't calculate the LCS of two strings in liner time approximately. I also searched for it but couldn't find anything. – user137927 Nov 22 '16 at 21:19
-
To clarify, can you talk about what sort of approximation you're going for? You can get wildly inaccurate answers pretty quickly if you'd like, but it depends on what you're trying to do and how much error you can tolerate. – templatetypedef Nov 22 '16 at 21:20
-
@templatetypedef A constant approximation of the exact solution would be very good, and I should have said that only the maximum length is sufficient and the sequence is not necessary. – user137927 Nov 23 '16 at 07:05