I encountered a problem which can be summarized as below:
Given a set of sequences with weights which are extracted from a n-length sequence Need to find the n-length sequence which is compatible with sub-set of the given sequences and have maximum summed weights
e.g. given sequences below, find a sequence of length 6 with max weight from compatible(has same character at overlapping locations) sub-set of given sequences.
1. a,b,c,d,e weight 1
2. b,e weight 2
3. c,d weight -1
4. a,b weight 0
5. d,e,f weight 3
in the example, it should be a,b,e,d,e,f which has weight of 5 (2,4,5 are compatible with each other(has same character or empty at same position)
I only find a solution to convert the problem to 5 vertex graph with edges represent compatibility between pair of sub-sequences then find max weighted clique but it is NP-hard so the performance for sub-sequence count over 200 is too bad to be usable.
is there any better algorithm for this problem ?