I want to find the longest common sub-sequence of N strings. I got the algorithm that uses Dynamic Programming for 2 strings, but if I extend it to N, it will consume exponential amount of memory, as I need an array of N dimensions. It is not an option.
In the common case (90%), almost all strings will be the same.
If I try to break my N sequences in N/2 pairs of 2 strings each, run the LCS of 2 strings separately for each pair, I'll have N/2 sub-sequences. I can remove the duplicates and repeat this process until I have only one sub-sequence, that is common to all strings in the input.
Is there something that I am missing? It doesn't look like a solution to a N-hard problem...
I know that each call to LCS with each pair of strings may have more than one sub-sequence as solution, but if I get only one of these sub-sequences to use as input in the next call, maybe my final sub-sequence isn't the longest possible, but I have something that may fit my needs.
If I try to use all possible solutions for one pair and combine then with all possible solutions from another pairs (that each of them may have more than one too), I may end up with exponential time. Am I right?