1

I'm trying to solve the String Function Calculation problem from Hackerrank. In this problem, we're given a string as input and asked to print a number that represents the maximum of the following function, among all substrings of the input string:

f(s, t) = number of times the substring 's' appears in string 't' * length of substring 's'

I submitted the following as an answer:

import Data.List

main :: IO()
main = do
    stringInput <- getLine
    print $ solution stringInput

solution :: String -> Int
solution input = maximum $ map sum $ map (map length) $ group $ sort $ substrings input

substrings :: String -> [String]
substrings s = tail . inits =<< tails s

The idea was to:

  1. Get all substrings of s. let s = "aaaaaa"; substrings s = ["a","aa","aaa","aaaa","aaaaa","aaaaaa","a","aa","aaa","aaaa","aaaaa","a","aa","aaa","aaaa","a","aa","aaa","a","aa","a"]

  2. Sort it. ["a","a","a","a","a","a","aa","aa","aa","aa","aa","aaa","aaa","aaa","aaa","aaaa" ,"aaaa","aaaa","aaaaa","aaaaa","aaaaaa"]

  3. Group it. [["a","a","a","a","a","a"],["aa","aa","aa","aa","aa"],["aaa","aaa","aaa","aaa"],["aaaa","aaaa","aaaa"],["aaaaa","aaaaa"],["aaaaaa"]]

  4. Get the individual lengths of each substring. [[1,1,1,1,1,1],[2,2,2,2,2],[3,3,3,3],[4,4,4],[5,5],[6]]

  5. Sum the resulting lists. [6,10,12,12,10,6].

  6. Get the maximum. 12.

This passes the preliminary tests. However, when I submit it it fails all other tests, by 'Runtime Error'.

Test case no. 2, the first to fail, takes 1.47 seconds to run and has the following input:

"aacbbabbabbbbbaaaaaaabbbbcacacbcabaccaabbbcaaabbccccbbbcbccccbbcaabaaabcbaacbcbaccaaaccbccbcaacbaccbaacbbabbabbbbbaaaaaaabbbbcacacbcabaccaabbbcaaabbccccbbbcbccccbbcaabaaabcbaacbcbaccaaaccbccbcaacbaccbaacbbabbabbbbbaaaaaaabbbbcacacbcabaccaabbbcaaabbccccbbbcbccccbbcaabaaabcbaacbcbaccaaaccbccbcaacbaccbaacbbabbabbbbbaaaaaaabbbbcacacbcabaccaabbbcaaabbccccbbbcbccccbbcaabaaabcbaacbcbaccaaaccbccbcaacbaccbaacbbabbabbbbbaaaaaaabbbbcacacbcabaccaabbbcaaabbccccbbbcbccccbbcaabaaabcbaacbcbaccaaaccbccbcaacbaccb"

Could you help me figure out what I'm doing wrong, or what's going on?

  • 2
    The `sort` is what's killing you. `String`s don't have an O(1) comparison, so sorting the 125,250 strings (with a total length of 20,958,500 characters) is sloow. – rampion Dec 10 '15 at 19:27
  • 3
    It's the `sort`. It really isn't that slow, it runs well within the time limit, but the `sort` is forcing every single substring to be in memory at the same time, so you're getting an out-of-memory exception. If you can count the substrings as they come in, without sorting them first, it should use a lot less memory. – DarthFennec Dec 10 '15 at 20:06

2 Answers2

1

This won't work. Sort gets extremely expensive (memory wise) as the intermediate
products in this list remain in memory and these products are large. It's a memory
error

A better approach would be to use Suffix Arrays O(n log2n) and then create the
longest prefix array (LCP) using Kasai's Algorithm in O(n) and then use the LCP array for
the remainder of the problem.

Calculate LCP[i] and LCP[i+1] If they are equal, then it means that there are two equal sub strings Proceed this way.

isopropylcyanide
  • 423
  • 4
  • 16
0
#include<iostream>
#include<algorithm>
#include<cstring>
using namespace std;

#define nb nexta
#define head height
#define rank b

const int maxn = 100010;
char s[maxn];
int n, id[maxn], height[maxn], b[maxn], nexta[maxn];

bool cmp(const int& i, const int& j)
{
    return s[i] < s[j];
}

void SuffixSort()
{
    int i, j, k, h;
    for (i = 0; i < n; i++) id[i] = i;
    sort(id, id + n, cmp);
    for (i = 0; i < n; i++)
    {
        if (i == 0 || s[id[i]] != s[id[i - 1]])
            b[id[i]] = i;
        else b[id[i]] = b[id[i - 1]];
    }
    for (h = 1; h < n; h <<= 1)
    {
        for (i = 0; i < n; i++)
            head[i] = nexta[i] = -1;
        for (i = n - 1; i >= 0; i--)
        {
            if (id[i])
            {
                j = id[i] - h;
                if (j < 0) j += n;
                nexta[j] = head[b[j]];
                head[b[j]] = j;
            }
        }
        j = n - h;
        nexta[j] = head[b[j]];
        head[b[j]] = j;
        for (i = k = 0; i < n; i++)
            if (head[i] >= 0)
                for (j = head[i]; j >= 0; j = nexta[j])
                    id[k++] = j;
        for (i = 0; i < n; i++)
            if (i>0 && id[i] + h < n&&id[i - 1] + h < n&&b[id[i]] == b[id[i - 1]] && b[id[i] + h] == b[id[i - 1] + h])
                nb[id[i]] = nb[id[i - 1]];
            else
                nb[id[i]] = i;
        for (i = 0; i < n; i++)
            b[i] = nb[i];
    }
}

void GetHeight()
{
    int i, j, h; height[0] = 0;
    for (i = 0; i < n; i++)
        rank[id[i]] = i;
    for (h = 0, i = 0; i < n; i++)
    {
        if (rank[i] > 0)
        {
            j = id[rank[i] - 1];
            while (s[i + h] == s[j + h])++h;
            height[rank[i]] = h;
            if (h>0) --h;
        }
    }
}

int st[maxn], top;

int main()
{
    cin >> s;
    n = strlen(s);
    top = 0;
    SuffixSort();
    GetHeight();
    height[n] = 0;
    int best = n;
    st[top++] = 0;
    for (int i = 1; i < n + 1; i++)
    {
        //cout << height[i] << " ";
        while (top != 0 && height[i] < height[st[top - 1]])
        {
            int val = height[st[top - 1]];
            top--;
            best = max(best, val * (top == 0 ? i : i - st[top - 1]));
        }

        if (top == 0 || height[i] >= height[st[top - 1]])
            st[top++] = i;
    }
    cout << best << endl;
    return 0;
}
DoesEatOats
  • 625
  • 7
  • 13