Algorithm that finds the N-th most frequent number in the array

Question

I want to write an algorithm that finds the n-th most frequent number in an array. I have a solution but not optimal (testing numbers i've already tested) I wonder if there is a more optimal solution? Here is my Solution :

most_freq_element(a,n){
final_cnt = 0, curr_cnt = 1, final_freq_num = -1, curr_freq_num = -1;
for(i = 0; i < n-1; i++)
{
    if (a[i]!=-1){
        curr_freq_num = a[i];
        for(j =i+1; j < n; j++){
            if(curr_freq_num == a[j] && final_freq_num != curr_freq_num){
                curr_cnt++;
            }
        }
        if(final_cnt < curr_cnt){
            final_cnt = curr_cnt;
            curr_cnt = 1;
            final_freq_num = curr_freq_num;
        }
    }
}
printf("Num = %d and times = %d", final_freq_num, final_cnt);
}



nth_most_frequent_element(a,n,k){    
if(k==1){
    return most_freq_element(a,n);
}
else{ 
    for (i=0;i<k;i++){
        int most_freq_num = most_freq_element(a,n);

        for(i = 0; i < n-1; i++){
            if (a[i]==most_freq_num){
                a[i]=-1;
            }
        }
    }
    return most_freq_element(a,n);
}
}

Possible duplicate of [Find the N-th most frequent number in the array](https://stackoverflow.com/questions/10965952/find-the-n-th-most-frequent-number-in-the-array) — Patrick Roberts, Jul 06 '17 at 02:27

Araymer · Answer 1 · 2017-07-07T15:00:03.740

1

I would probably make a hashmap/table, and increment each value on collision, so that the number is the key and the value is the number of collisions. Then, when you're done, aggregate it to a sorted list and grab the nth element. Would run in O(n) which is pretty optimal.

Edit: Actually, the sorting would cost n*log(n).

edited Jul 07 '17 at 15:00

answered Jul 06 '17 at 03:54

Araymer

1,315
1
10
16

1

`aggregate it to a sorted list` would make it O(nlog(n)) – Courage Jul 06 '17 at 03:58
You don't count the N hash map key lookups, – Michaël Roy Jul 06 '17 at 05:00
They're insignificant compared to nlogn. As the operations approach infinity the only term that matters is nlogn. You don't see many algorithms including every insignificant term. They just pose the term of most significance, unless there are multiple terms of equal significance – Araymer Jul 06 '17 at 15:19
Sum{n=1..N-1}(log(N-n)) is not insignificant, since it's larger than N. – Michaël Roy Jul 07 '17 at 12:16
It's one iteration through the linked list for the collision count and constant-time hashmap inserts for O(n), then creating a sorted list from the entries O(nlogn + n) -> O(nlogn) and then grabbing the nth element for a max of O(n) or O(1) for a sorted array. So it's O(3n + nlogn) -> O(nlogn). I don't know what you're referring to. – Araymer Jul 07 '17 at 14:54
Wait, are you under the impression that hash lookups aren't constant time? A lookup on a hash table is based on offsets given by the hash. They're constant time, so all the lookups are O(n) – Araymer Jul 07 '17 at 15:03

Michaël Roy · Accepted Answer · 2017-07-11T03:24:01.867

How about this ? Worst complexity is O(2.N.logN + k.min(k,d)), d: number of unique values in a

most_freq_element(a[0..n-1],n,k)
{
  count[0..k], value[0..k];   // k + 1 elements
  i, j, l;

  qsort(a, n)

  j = 0;
  l = 0;
  value[0] = a[0];
  count[0] = 1;
  for (i = 1; i < n; ++i)
  {
     if (a[i] != value[j])
     {
        if (++l > k) l = k;
        j = l;
        value[j] = a[i];
        count[j] = 0;
     }

     ++count[j];

     while (j > 0 && count[j] > count[j - 1])
     {
        swap(count[j], count[j - 1]);
        swap(value[j], value[j - 1]);
        --j;
     }
  }
  printf("Num = %d and times = %d", value[k - 1], count[k - 1]);
}

Algorithm that finds the N-th most frequent number in the array

2 Answers2