4

I'm looking for an algorithm that addresses the LCS problem for two strings with the following conditions:

Each string consists of English characters and each character has a weight. For example:

sequence 1 (S1): "ABBCD" with weights [1, 2, 4, 1, 3]

sequence 2 (S2): "TBDC" with weights [7, 5, 1, 2]

Suppose that MW(s, S) is defined as the maximum weight of the sub-sequence s in string S with respect to the associated weights. The heaviest common sub-sequence (HCS) is defined as:

HCS = argmin(MW(s, S1), MW(s, S2))

The algorithm output should be the indexes of HCS in both strings and the weight. In this case, the indexes will be:

I_S1 = [2, 4] --> MW("BD", "ABBCD") = 7

I_S2 = [1, 2] --> MW("BD", "TBDC") = 6

Therefore HCS = "BD", and weight = min(MW(s, S1), MW(s, S2)) = 6.

Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
  • 1
    What have you done to solve the task? – Michel_T. May 13 '19 at 23:05
  • 2
    If the weights are non-negative you can use the same approach as the normal LCS; however, in the recursive formula you should use the sum of the weights to minimize instead of the number of characters. I'm not sure about the case with negative weights though. – A. Mashreghi May 14 '19 at 01:11
  • Yes the weights are non-negative. Do you happen to know which LCS algorithm I can use to modify and solve my own problem? – Amin Edraki May 14 '19 at 01:54

1 Answers1

1

The table that you need to build will have this.

for each position in sequence 1
    for each position in sequence 2
        for each extreme pair of (weight1, weight2)
            (last_position1, last_position2)

Where an extreme pair is one where it is not possible to find a subsequence to that point whose weights in sequence 1 and weights in sequence 2 are both >= and at least one is >.

There may be multiple extreme pairs, where one sequence is higher than the other.

The rule is that at the (i, -1) or (-1, j) positions, the only extreme pair is the empty set with weight 0. At any other we merge the extreme pairs for (i-1, j) and (i, j-1). And then if seq1[i] = seq2[j], then add the options where you went to (i-1, j-1) and then included the i and j in the respective subsequences. (So add weight1[i] and weight2[j] to the weights then do a merge.)

For that merge you can sort by weight1 ascending, all of the extreme values for both previous points, then throw away all of the ones whose weight2 is less than or equal to the best weight2 that was already posted earlier in the sequence.

When you reach the end you can find the extreme pair with the highest min, and that is your answer. You can then walk the data structure back to find the subsequences in question.

btilly
  • 43,296
  • 3
  • 59
  • 88