0

You are given a string and can change at most Q letters in the string. You are also given a list of substrings (each two characters long), with a corresponding score. Each occurance of the substring within the string adds to your total score. What is the maximum possible attainable score?

String length <= 150, Q <= 100, Number of Substrings <= 700


Example:

String = bpdcg

Q = 2

Substrings:

bz - score: 2

zd - score: 5

dm - score: 7

ng - score: 10

In this example, you can achieve the maximum score b changing the "p" in the string to a "z" and the "c" to an "n". Thus, your new string is "bzdng" which has a score of 2+5+10 = 17.

I know that given a string which already has the letters changed, the score can be checked in linear time using a dictionary matching algorithm such as aho-corasick (or with a slightly worse complexity, Rabin Karp). However, trying each two letter substitution will take too long and then checking will take too long.

Another possible method I thought was to work backwards, to construct the ideal string from the given substrings and then check whether it differs by at most two characters from the original string. However, I am not sure how to do this, and even if it could be done, I think that it would also take too long.

What is the best way to go about this?

Community
  • 1
  • 1
1110101001
  • 4,662
  • 7
  • 26
  • 48
  • What is the maximum length of the String and Q? – Pham Trung Oct 28 '14 at 02:55
  • @PhamTrung The string can be at max 150 characters and Q can be at max 100 – 1110101001 Oct 28 '14 at 03:11
  • 1
    This looks like a knapsack problem. – n. m. could be an AI Oct 28 '14 at 04:13
  • @n.m. In what way? I see how the points assigned to each substring could be like the items, and how the knapsack is the length of the string, but how do you account for the fact that you can make only at most two alterations to the original string? – 1110101001 Oct 28 '14 at 04:16
  • Are all the substrings length 2? – Paul Hankin Oct 28 '14 at 04:18
  • You need to select items with maximum total price and total weight no more than Q. Granted, when Q==2 you have a rather easy knapsack problem. – n. m. could be an AI Oct 28 '14 at 04:28
  • @Anonymous Yes, all substrings are of length two. I forgot to add that to the opening post. Fixed. – 1110101001 Oct 28 '14 at 04:52
  • @n.m. So if I understood correctly, you knapsack on the list of substrings, trying to maximize your score, with the total weight (where each substring has a weight of 1) not exceeding Q? I don't see how that will always work though. It is not guaranteed that all of the provided substrings can be made by less than Q substitutions into the string. – 1110101001 Oct 28 '14 at 05:00
  • Suppose symbols in all the replacement strings are different both from each other and from all the symbols in the source string. Also the lenght of the source is large enough. Then the problem is exactly the knapsack problem, with substitution string length being the weight. Presence of identical symbols may or may not make the problem easier (likely to make it harder actually). – n. m. could be an AI Oct 28 '14 at 05:19

1 Answers1

1

An efficient way to solve this is to use dynamic programming.

Let L be the set of letters that start any of the length-2 scoring substrings, and a special letter "*" which stands for any other letter than these.

Let S(i, j, c) be the maximum score possible in the string (up to index i) using j substitutions, where the string ends with character c (where c in L).

The recurrence relations are a bit messy (or at least, I didn't find a particularly beautiful formulation of them), but here's some code that computes the largest score possible:

infinity = 100000000

def S1(L1, L2, s, i, j, c, scores, cache):
    key = (i, j, c)
    if key not in cache:
        if i == 0:
            if c != '*' and s[0] != c:
                v = 0 if j >= 1 else -infinity
            else:
                v = 0 if j >= 0 else -infinity
        else:
            v = -infinity
            for d in L1:
                for c2 in [c] if c != '*' else L2 + s[i]:
                    jdiff = 1 if s[i] != c2 else 0
                    score = S1(L1, L2, s, i-1, j-jdiff, d, scores, cache)
                    score += scores.get(d+c2 , 0)
                    v = max(v, score)
        cache[key] = v
    return cache[key]

def S(s, Q, scores):
    L1 = ''.join(sorted(set(w[0] for w in scores))) + '*'
    L2 = ''.join(sorted(set(w[1] for w in scores)))
    return S1(L1, L2, s + '.', len(s), Q, '.', scores, {})

print S('bpdcg', 2, {'bz': 2, 'zd': 5, 'dm': 7, 'ng': 10})

There's some room for optimisation:

  • the computation isn't terminated early if j goes negative
  • when given a choice, every value of L2 is tried, whereas only letters that can complete a scoring word from d need trying.

Overall, if there's k different letters in the scoring words, the algorithm runs in time O(QN*k^2). With the second optimisation above, this can be reduced to O(QNw) where w is the number of scoring words.

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118