0

I encountered a problem which can be summarized as below:

Given a set of sequences with weights which are extracted from a n-length sequence Need to find the n-length sequence which is compatible with sub-set of the given sequences and have maximum summed weights

e.g. given sequences below, find a sequence of length 6 with max weight from compatible(has same character at overlapping locations) sub-set of given sequences.
1. a,b,c,d,e     weight 1
2.   b,e         weight 2
3.     c,d       weight -1
4. a,b           weight 0
5.       d,e,f   weight 3

in the example, it should be a,b,e,d,e,f which has weight of 5 (2,4,5 are compatible with each other(has same character or empty at same position)

I only find a solution to convert the problem to 5 vertex graph with edges represent compatibility between pair of sub-sequences then find max weighted clique but it is NP-hard so the performance for sub-sequence count over 200 is too bad to be usable.

is there any better algorithm for this problem ?

vincent
  • 1
  • 1
  • Does each sequence "with weights which are extracted from a n-length sequence" need to be such that its indices are consecutive? ​ (For example, can `a,c` be "extracted" from `a,b,c`?) ​ What does "is compatible with" mean? ​ ​ ​ ​ –  Dec 13 '16 at 10:37
  • the sub-sequence should be consecutive, a,c is not a sub-sequence of a,b,c. and each sub-sequence has a start position information as well, you can see I put spaces to align the sub-sequence start pos. And compatible means for two sub-sequence, for the same position, they have the same character or empty, in the example, 1 and 2 are not compatible because at position 3, 1 has c but 2 has e, so they are not compatible, but 3 and 5 are compatible, their only common position 4 has same character d. – vincent Dec 13 '16 at 11:32

0 Answers0