2

Suppose I have an array of integers:

[ 1,2,3,4,5,6,1,2,3,1,2... ]

I want to know the K most frequent elements. The phrase "K most frequent" immediately makes me think of a Max Heap data structure, so I've decided to create a custom object to both count and prioritize elements:

public class countedInts implements Comparable<countedInts>{
    public int theInt, count;
    public countedInts(int a, int b) {
        this.theInt = a;
        this.count = b;
    }
    @Override
    public int compareTo(countedInts o) {
        return this.count - o.count;
    }
}

This object is essentially two ints, paired together. Simple.

Okay: now for the main code:

public int[] topKFreq(int[] arr, int k) {

    PriorityQueue<countedInts> maxHeap = new PriorityQueue<>(Collections.reverseOrder());

    for( int i=0; i<arr.length; i++ ) {
        // If arr[i] is not tracked with a countedInts object:
            maxHeap.offer( new countedInts(arr[i], 1) );
        // But if arr[i] is already stored within a countedInts object...
            countedInts tmp = maxHeap.get( ??? );
            tmp.count++;
            maxHeap.offer( tmp );
    }
}

You see the problem. As I consider the elements in arr, I need a way to check maxHeap to see if I have an countedInts object already checking the element. I need to look at the member of the object within the PriorityQueue. Is there a way to do this? Or is there a better strategy to this problem.

FULL DISCLOSURE: Yes, this is a LeetCode problem. I always like to research a solution before I give up and look at the solution. That's a better way to learn.

=======================================================================

[EDIT] :: A user suggested the below strategy, which worked. Posted in case it can help others...

public class countedInts implements Comparable<countedInts>{
    public int theInt, count;
    public countedInts(int a, int b) {
        this.theInt = a;
        this.count = b;
    }
    @Override
    public int compareTo(countedInts o) {
        return this.count - o.count;
    }
}


public int[] topKFrequent(int[] arr, int k) {

    // Edge cases
    if( arr == null ) {
        return null;
    }
    else if( arr.length == 0 ) {
        return arr;
    }
    
    int[] ret = new int[k];
    HashMap<Integer,Integer> myMap = new HashMap<>();
    PriorityQueue<countedInts> maxHeap = new PriorityQueue<>(Collections.reverseOrder());

    // Populate HashMap
    for( int i=0; i<arr.length; i++ ) {
        if( !myMap.containsKey(arr[i]) ) {
            myMap.put(arr[i], 1);
        }
        else {
            myMap.put(arr[i], myMap.get(arr[i])+1);
        }
    }

    // Transfer data into MaxHeap
    for( Map.Entry<Integer, Integer> glork : myMap.entrySet() ) {
        maxHeap.offer( new countedInts(glork.getKey(), glork.getValue()) );
    }

    // Pick out K-most values
    for( int i=0; i<k; i++ ) {
        countedInts tmp = maxHeap.poll();
        ret[i] = tmp.theInt;
    }

    return ret;
}
Pete
  • 1,511
  • 2
  • 26
  • 49
  • 1
    No, looking through the heap means traversing it and defeating the purpose of its structure, worsening your time complexity. Use a balanced binary search tree or order statistics tree instead. – Ryan Zhang Jun 12 '22 at 17:14
  • 1
    Are there any restrictions regarding runtime and space complexity? – Eritrean Jun 12 '22 at 17:21
  • @Eritrean No particular restrictions, but I'd like a reasonable solution - no O(n^2) solution, if possible. I'm imagining that this is a coding interview question. – Pete Jun 12 '22 at 17:32
  • 1
    I don't see what the problem has to do with a priority queue. You can just group by the elements, then sort by size of resulting collection. – daniu Jun 12 '22 at 17:36
  • @daniu Yeah, you're prob right. The "K most frequent" phrasing in the problem statement made me think that this was a MaxHeap problem. – Pete Jun 12 '22 at 17:50
  • 1
    @daniu When `k` is mach less than array length, there would be a significant difference in performance between sorting and utilizing `PriorityQueue` & `Map`. And since it's an algorithmic question - performance matters. – Alexander Ivanchenko Jun 12 '22 at 20:38
  • 1
    @AlexanderIvanchenko I'd argue readability > performance until an actual problem comes up, but long discussions have been led about that elsewhere. Either way, even if I were to optimize performance, I'd rather write my own `Collector` rather than using a `PriorityQueue` since I'd find that misleading. – daniu Jun 13 '22 at 07:31
  • 1
    @daniu *Collector rather than using a PriorityQueue* Judging by your comment that interposes `PriorityQueue` and `Collector` you have a very poor understanding on how Collectors work. Collector is **not** a Data structure implementation! Streams and Collectors are not a substitution for algorithms - google the difference between the algorithm and implementation. Every collector is based on a mutable object like a Collection, StrringBuilder, array, etc. And *accumulator* function defines how this mutable container has to be populated. – Alexander Ivanchenko Jun 13 '22 at 14:36
  • 1
    @daniu If you think that if you cram a complex algorithm into a collector it would become more readable, you misunderstand the purpose of collectors, which is to facilitate mutable reduction which is **frequently** performed with **streams**. And you probably never tried to write a custom collector (if did, you would never claim that). I kindly advise you to get familiar with documentation and standard collectors built-in in the JDK. – Alexander Ivanchenko Jun 13 '22 at 14:36
  • 1
    @AlexanderIvanchenko I've written enough Collectors to know that in the end, they are exactly that, data structures optimized towards the specific use case, offering access/aggregation methods compatible with the `Collector.of` signature. And as I've stated above, one should only do this with a good reason which would render readability less important. Which is very rarely the case, since your original point about performance is only relevant in very extreme cases with millions of array entries - and even then, it would require measurements to evaluate runtime behavior. – daniu Jun 14 '22 at 07:38
  • 1
    @daniu Collectors are not Data structure implementations. Take a time a get familiar with the basic definition of the [`Data structure`](https://en.wikipedia.org/wiki/Data_structure). Data structures are meant to provide extensive support for managing the data: add a new element, remove existing, set existing element, perform contains check, etc. You **can't** do any of these operations on a Collector, because Collector is a **one-time-use mean of reduction** which is intended to be used only internally by a stream pipeline, and not a Data structure implementation by any means. – Alexander Ivanchenko Jun 14 '22 at 09:28

3 Answers3

2

An approach using streams, which I think is somehow self documenting:

Stream over your array, group by identity and map to frequency using Collectors.counting, stream over the resulting map, sort entries by value in reverse order, limit the stream to k elements, and map to key

public int[] topKFrequent(int[] nums, int k) {
    return Arrays.stream(nums)
                 .boxed()
                 .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                 .entrySet().stream()
                 .sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
                 .limit(k)
                 .map(Map.Entry::getKey)
                 .mapToInt(Integer::intValue)
                 .toArray();
}
Eritrean
  • 15,851
  • 3
  • 22
  • 28
1

You can first use a HashMap to count the frequencies of all the numbers in the given array.

Then iterate through the hashmap to create the CountedInts objects and insert those objects into the priority queue.

Haoliang
  • 1,184
  • 4
  • 11
1

As described in the answer by @Haoliang, you can generate a Map containing a frequency of each number in the source array. And then populate a PriorityQueue with entries of this map.

This approach would more performant than sorting all the entries, especially when k is much lower than the number element in the array.

That is how the code might look like:

public static int[] topKFrequent(int[] nums, int k) {
    int[] result = new int[k];
    
    Map<Integer, Integer> freq = getFrequencies(nums);
    
    Queue<Map.Entry<Integer, Integer>> entries = populateQueue(freq);
    
    for (int i = 0; i < result.length; i++) {
        result[i] = entries.remove().getKey();
    }
    return result;
}

Java 8 method merge() is used to make the code for generating the map frequencies more concise:

public static Map<Integer, Integer> getFrequencies(int[] arr) {
    Map<Integer, Integer> hist = new HashMap<>();
    for (int next: arr) {
        hist.merge(next, 1, Integer::sum);
    }
    return hist;
}

The method provided below creates a Max Heap using PriorityQueue and populates it with entries of the map:

public static Queue<Map.Entry<Integer, Integer>> populateQueue(Map<Integer, Integer> hist) {
    Queue<Map.Entry<Integer, Integer>> entries =
        new PriorityQueue<>(Map.Entry.<Integer, Integer>comparingByValue().reversed());

    entries.addAll(hist.entrySet());
    return entries;
}

main()

public static void main(String[] args) {
    int[] nums = {1, 1, 1, 2, 2, 3};
    System.out.println(Arrays.toString(topKFrequent(nums, 2)));
    
    nums = new int[]{1};
    System.out.println(Arrays.toString(topKFrequent(nums, 1)));
}

Output:

[1, 2]
[1]
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46