Find frequent occuring letter in log(n) time?

Question

If given a string such as , "aaabbccc", how would you output 'a' since it occurs just as frequently as 'c' but occurs first.

I did it using O(n) time, but I can't figure out how you would do this using log(n) time, whether in java or c++.

EDIT: This was an interview question.

#include <iostream>
#include <string>
using std::string;
using std::cout;
using std::cin;
using std::endl;

char findFreqChar(string str) {
    int count;
    int maxOccur = 0;
    char maxChar;
    for (char i = 'A'; i < 'z'; i++) {
        count = 0;

        for (int j = 0; j < str.length(); j++) {
            if (i == str[j])
                count++;
        }
        if (count > maxOccur) {
            maxOccur = count;
            maxChar = i;
        }
    }
    return maxChar;
}
int main() {
    std::cout << "Enter String: ";
    std::string str;
    std::getline(std::cin, str);
    cout << findFreqChar(str);
    cin.get();
}

This question is better suited for http://cs.stackexchange.com/. — jean, Apr 02 '15 at 16:35
I am pretty sure that's not how the interview question was formulated. — Marc Glisse, Apr 02 '15 at 17:31

score 2 · Answer 1 · answered Apr 02 '15 at 16:50

2

There is no way to find the most frequent letter in less than O(n) time because you can't determine that information without checking every character in the string!

answered Apr 02 '15 at 16:50

Mark B

95,107
10
109
188

score 0 · Answer 2 · answered Apr 02 '15 at 17:08

If you can guarantee that the letters are sorted, as in your example, then you can use binary-searches to identify the ends of each continuous letter range. Each binary search will be log(n); worst case you'll need to do 25 of them to find all of the boundaries but "25 x constant x log(n)" is still O(log(n)) I suppose.

If you take the binary search approach then there is scope for doing this smartly - spotting when successive tests in the same binary search return the same letter and so assuming that as a minimum range size, then aborting any possible range that's shorter than that - but chances are you'd do better to code this up using the separate searches. Or probably better to just take the O(n) scan solution: do you really need to do this O(log(n))?

I dont think the interview mentioned a sorted string, like the example I gave, but this seems to be the most viable way of doing it. — user3821306, Apr 02 '15 at 17:25

score 0 · Answer 3 · answered Apr 02 '15 at 17:41

If you need to accomplish this in O(log(n)) time, then that suggests that you need to develop some type of divide-and-conquer algorithm. I am assuming (based on the example you gave us), that all occurrences of one letter are contiguous. Therefore, we can do the following:

1) Split the array in half and recursively call the algorithm. The sub-algorithm must return 4 values: - The most frequent value occurring in the array and its frequency - Number of contiguous characters ending at the rightmost character - Number of contiguous characters ending at the leftmost character

So the recursive call when called against "aabbbbcc" would return: (b, 4, 2, 2)

2) Combine the two sub-arrays and return the result for the (now-larger array). First, we need to calculate the most frequent character in the combined array. This is easily calculated as either the longest sequence in the right, the longest sequence in the left, or a sequence that spans the split point (which is why we needed the last 2 values from the recursive call). This can all be done in constant time. We also return the appropriate values from the two recursive calls for the length of contiguous characters ending at the rightmost and leftmost characters.

This recursion ends up with T(n) = T(n/2) + O(1), or O(lg n)

There are quite a few boundary cases to handle and you need to figure out how to handle when the recursion "bottoms-out", but this should be enough to get you writing the code.

Find frequent occuring letter in log(n) time?

3 Answers3