I'm trying to figure out how to compute the shortest sequence containing a given set of subsequences. For example, given: abcd bcdgh cdef The answer should be abcdefgh
I was thinking about first computing the longest common subsequences of all strings and then go string by string and add what's missing. Given that I want to run this on an input of about 5-10 sequences, each 50-100 items long, with LCS this would be of O(100^10), a bit too time consuming..
Would the following approach give the near-optimal answer for most inputs?
- Compute LCS of string 1 and 2
- Add missing items from string 1 and 2
- Compute LCS of result with string 3
- Add missing items from string 3 .. and so on?
(assuming not because there is ambiguity on where to add the missing items after each step)
I'm looking for a fast computation (a few milliseconds) and am ready to accept occasional non-optimal solutions if an efficient deterministic algorithm is not possible.
I'm sure people have thought about this, would be glad if someone can point me in the right direction.
Thanks,
Martin
Looking up literature on LCS and related problems