How to calculate the number of longest common subsequences

Question

I'm trying to calculate the amount of longest possible subsequences that exist between two strings.

e.g. String X = "efgefg"; String Y = "efegf";

output: The Number of longest common sequences is: 3 (i.e.: efeg, efef, efgf - this doesn't need to be calculated by the algorithm, just shown here for demonstration)

I've managed to do this in O(|X|*|Y|) using dynamic programming based on the general idea here: Cheapest path algorithm.

Can anyone think of a way to do this calculation with better runtime efficiently?

--Edited in response to Jason's comment.

These look to be subsequences and not substrings. Please clarify. — jason, Feb 11 '10 at 15:16
I am not sure I understand what you are calculating. What is the rule that makes efeg, efef, efgf all valid solutions? I suppose you can't rearrange order of chars, but only remove some? Are the two strings supposed to be completely generic, so that you may have "X=AAAAAAAAAAAAAAAAAAAAAAAAA" and "Y=B" for example, and in this case the answer would be 0? — p.marino, Feb 11 '10 at 15:25
@p.marino: correct. You can't rearrange the order, but you can remove letters. The answer would be 0 in your example. — Meir, Feb 11 '10 at 15:31
For X=AAAAAAAAAAAAAAAAA and Y=B, shouldn't the amount of longest common subsequences be 1? There is one common subsequence of length 0, which is the longest one. — rettvest, Feb 11 '10 at 20:58
See http://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Complexity, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Computing_the_length_of_the_LCS — Beni Cherniavsky-Paskin, Feb 26 '10 at 10:43

score 1 · Answer 1 · answered Mar 01 '10 at 14:45

1

Longest common subsequence problem is a well studied CS problem.

You may want to read up on it here: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

answered Mar 01 '10 at 14:45

KaptajnKold

10,638
10
41
56

score 0 · Answer 2 · answered Feb 14 '10 at 18:59

I don't know but here are some attempts at thinking aloud:

The worst case I was able to construct has an exponential - 2**(0.5 |X|) - number of longest common subsequences:

X = "aAbBcCdD..."
Y = "AaBbCcDd..."

where the longest common subsequences include exactly one of {A, a}, exactly one of {B, b} and so forth... (nitpicking: if you alphabet is limited to 256 chars, this breaks down eventually - but 2**128 is already huge.)

However, you don't necessarily have to generate all subsequences to count them. If you've got O(|X| * |Y|), you are already better than that! What we learn from this is that any algorithm better than yours must not attempt to generate the actual subsequences.

score 0 · Answer 3 · edited Jan 09 '17 at 07:36

First of all, we do know that finding any longest common subsequence of two sequences with length n cannot be done in O(n^2-ε) time unless the Strong Exponential Time Hypothesis fails, see: https://arxiv.org/abs/1412.0348

This pretty much implies that you cannot count the number of ways how to align common subsequences to the input sequences in O(n^2-ε) time. On the other hand, it is possible to count the number of ways of such alignments in O(n²) time. It is also possible to count them in O(n²/log(n)) time with the so-called four-Russians speed-up.

Now the real question if you really intended to calculate this or you want to find the number of different subsequences? I am afraid that this latter is a #P-complete counting problem. At least, we do know that counting the number of sequences with a given length that a regular grammar can generate is #P-complete:

S. Kannan, Z. Sweedyk, and S. R. Mahaney. Counting and random generation of strings in regular languages. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 551–557, 1995

This is a similar problem in that sense that counting the number of ways a regular grammar can generate sequences of a given length is a trivial dynamic programming algorithm. However, if you do not want to distinguish generations resulting the same sequence, then the problem turns from easy to extremely hard. My natural conjecture is that this should be the case for sequence alignment problems, too (longest common subsequence, edit distance, shortest common superstring, etc.).

So if you would like to calculate the number of different subsequences of two sequences, then very likely your current algorithm is wrong and any algorithm cannot calculate it in polynomial time unless P = NP (and more...).

score 0 · Answer 4 · answered Jul 03 '17 at 16:56

0

Best Explanation(with Code) I found :

Count all LCS

answered Jul 03 '17 at 16:56

Jay Patel

505
6
10

Please expound upon you answer here, as opposed to simply including an external link. – kjones Jul 03 '17 at 17:14
Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – GhostCat Jul 03 '17 at 18:48

How to calculate the number of longest common subsequences

4 Answers4