How can I use this for x amounts of numbers ? for my approach, its hard coded to 2 substrings. Also is there a better way with less time complexity? There may be a loop hole over here which needs to be fixed about the number of num im passing as I am not using the um parameter at all.
2 Answers
Your current approach has a few problems, including a hard-coded maximum number of ngrams, and the fixed ngram size. In addition, your short variable names and lack of comments do not help explain the code to whoever is reading it.
A simpler solution is to use a map
to count the number of times each ngram occurs, and then find the one with the highest count. That would give rougly N.logN
time complexity. Alternatively unordered_map
would be closer to linear time complexity.
There will of course be an edge case where more than one ngram occurs the same highest count. You would need to decide which of a variety of strategies should be used to resolve that. In my example, I take advantage of intrinsic ordering of std::map
to select the ngram with the lowest sort order. If using unordered_map
, you'd need a different strategy for resolving contention in a deterministic way.
#include <algorithm>
#include <iostream>
#include <map>
#include <string>
std::string ngram(const std::string &input, int num)
{
if (num <= 0 || num > input.size()) return "";
// Count ngrams of size 'num'
std::map<std::string, int> ngram_count;
for(size_t i = 0; i <= input.size() - num; i++)
{
++ngram_count[input.substr(i, num)];
}
// Select ngram with highest count
std::map<std::string, int>::iterator highest = std::max_element(
ngram_count.begin(), ngram_count.end(),
[](const std::pair<std::string, int>& a, const std::pair<std::string, int>& b)
{
return a.second < b.second;
});
// Return ngram with highest count, otherwise empty string
return highest != ngram_count.end() ? highest->first : "";
}
int main()
{
std::cout << ngram("engineering", 2) << std::endl;
std::cout << ngram("engineering", 3) << std::endl;
return 0;
}

- 60,864
- 6
- 61
- 103
I did it a bit different than paddy, so I thought I would post it. Use std::set
. He explains the issue so should get the credit for your answer.
struct test {
test(const std::string& str) :val(str), cnt(0) {}
test(const test& thet) { *this = thet; }
std::string val;
int cnt;
friend bool operator < (const test& a, const test& b) { return a.val < b.val; }
};
using test_set_type = std::set<test>;
const test ngram(std::string A, int num) {
test_set_type set;
for (auto it = A.begin(); it < A.end() - num + 1; ++it)
{
auto found = set.find(std::string(it, it + num));
if (found != set.end())
++const_cast<test&>(*found).cnt;
else
set.insert(std::string(it, it + num));
}
int find = -1;
test_set_type::iterator high = set.begin();
for (auto it = set.begin(); it != set.end(); ++it)
if(it->cnt > find)
++find, high= it;
return *high;
}
int main() {
int num = 2;
std::string word("engineering");
std::cout << ngram(word, num).val << std::endl;
return 0;
}

- 1,859
- 2
- 16
- 21