I have a problem related to Episode Mining in which I need to find the maximal utility of a episode given its occurrences in an event sequence.But I am presenting the question in a different form so that it's easier to explain.
There is a long string, S, where each of its character have some positive score. Given another string, T, find a match with S containing all occurrences of the sequence of characters of T such that :-
- The occurrences are non-overlapping.
- The sequence of characters in S must be same as present in T but it can be discontinuous.
- Each occurrence should lie in a given window.
The total score of a match can be found by simply adding the scores of the characters at each occurrence.The problem is to find a match with maximum score of all the matches possible.
Example - String S - a(2) b(3) e(1) d(10) d(7) c(1) a(5) d(8) b(5) d(6)
String T - a b d
Window size - 5
Two matches of string T are:-
- [1,2,4], [7,9,10]. Score - [2+3+10] + [5+5+6]= 31
- [1,2,5], [7,9,10]. Score - [2+3+7] + [5+5+6]= 28. And the score is maximum in match 1 so it is the required answer.
We didn't consider the occurrence [1,2,8] or [1,2,10] as they are not in the given window as (8-1) > 5.
So, I would like to know if there is some solution to find the set of occurrences or match that gives the maximum score efficiently.