1

I would like to know how min heap is used here to solve the following problem.

What I thought to solve it is to use hashtable and save the counts of the numbers. but I don't know how to use the min heap to contiune solving the problem.

Given a non-empty array of integers, return the k most frequent elements.

For example, Given [1,1,1,2,2,3] and k = 2, return [1,2].

Note: You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int, int> counts;
        priority_queue<int, vector<int>, greater<int>> max_k;
        for(auto i : nums) ++counts[i];
        for(auto & i : counts) {
            max_k.push(i.second);
            // Size of the min heap is maintained at equal to or below k
            while(max_k.size() > k) max_k.pop();
        }
        vector<int> res;
        for(auto & i : counts) {
            if(i.second >= max_k.top()) res.push_back(i.first);
        }
        return res;
    }
ahmed andre
  • 71
  • 1
  • 9
  • Does the code you posted work ? If so, that's it -- `std::priority_queue` is a min-heap. – Quentin May 02 '16 at 15:35
  • @Quentin I'm beginner in data structures. can you explain how the heap is used to get the top K elements ? – ahmed andre May 02 '16 at 15:36
  • To me your question is unclear. You asking about getting the to code work or are you asking how the code works? – Support Ukraine May 02 '16 at 15:40
  • @4376427 am asking how the code works – ahmed andre May 02 '16 at 15:40
  • 2
    Note that this solution is not correct. If a lot of elements have exactly the same frequency, it may return more than `k` elements. Say, for `[1, 2, 3, 4], 3` it returns `[1, 2, 3, 4]`. The problem is not well-defined for such cases, though. (Should it just pick elements randomly or what?) – Sergei Tachenov May 02 '16 at 16:05

1 Answers1

2

The code works like this:

for(auto i : nums) ++counts[i];  // Use a map to count how many times the
                                 // individual number is present in input

priority_queue<int, vector<int>, greater<int>> max_k;  // Use a priority_queue
                                                       // which have the smallest
                                                       // number at top

for(auto & i : counts) {
    max_k.push(i.second);                 // Put the number of times each number occurred
                                          // into the priority_queue

    while(max_k.size() > k) max_k.pop();  // If the queue contains more than
                                          // k elements remove the smallest
                                          // value. This is done because
                                          // you only need to track the k
                                          // most frequent numbers

vector<int> res;                                         // Find the input numbers
for(auto & i : counts) {                                 // which is among the most
    if(i.second >= max_k.top()) res.push_back(i.first);  // frequent numbers
                                                         // by comparing their
                                                         // count to the lowest of
                                                         // the k most frequent.
                                                         // Return numbers whose 
                                                         // frequencies are among
                                                         // the top k

EDIT

As pointed out by @SergeyTachenov here How min heap is used here to solve this, your result vector may return more than k elements. Maybe you can fix that by doing:

for(auto & i : counts) {
    if(i.second >= max_k.top()) res.push_back(i.first);
    if (res.size() == k) break; // Stop when k numbers are found
}

Another small comment

You don't really need a while-statement here:

while(max_k.size() > k) max_k.pop();

an if-statement would do.

Community
  • 1
  • 1
Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • You should probably also correct the last comment because “Return the k most frequent” is not exactly right. More like “return numbers whose frequencies are among the top `k`”. – Sergei Tachenov May 02 '16 at 16:11