Algorithm to compute mode

Question

I'm trying to devise an algorithm in the form of a function that accepts two parameters, an array and the size of the array. I want it to return the mode of the array and if there are multiple modes, return their average. My strategy was to take the array and first sort it. Then count all the occurrences of a number. while that number is occurring, add one to counter and store that count in an array m. So m is holding all the counts and another array q is holding the last value we were comparing.

For example: is my list is {1, 1, 1, 1, 2, 2, 2} then i would have m[0] = 4 q[0] = 1 and then m[1] = 3 and q[1] = 2.

so the mode is q[0] = 1;

unfortunately i have had no success thus far. hoping someone could help.

float mode(int x[],int n)
{
    //Copy array and sort it
    int y[n], temp, k = 0, counter = 0, m[n], q[n];

    for(int i = 0; i < n; i++)
        y[i] = x[i];

    for(int pass = 0; pass < n - 1; pass++)
        for(int pos = 0; pos < n; pos++)
            if(y[pass] > y[pos]) {
                temp = y[pass];
                y[pass] = y[pos];
                y[pos] = temp;
            }

    for(int i = 0; i < n;){
        for(int j = 0; j < n; j++){
            while(y[i] == y[j]) {
                counter++;
                i++;
            }
        }
        m[k] = counter;
        q[k] = y[i];
        i--; //i should be 1 less since it is referring to an array subscript
        k++;
        counter = 0;
    }

}

You are not returning anything from your function. It is quite unclear to me what you mean with *mode* and/or what the result of the function should be. If it should be the average of all values, could just `return std::accumulate(x, x + n, 0.0) / n;`. BTW, C++ does not have variable sized arrays. You could, however, use `std::vector y(n);` instead. — Dietmar Kühl, Aug 11 '13 at 23:22
@DietmarKühl The function wasn't completed. By mode I mean the value that occurs the most often in the array. I'm not using a variable sized array since the size of the array is the parameter n. — Amber Roxanna, Aug 11 '13 at 23:24
You probably want to look at `std::map` or `std::unordered_map` to count the number of times each value occurs. The obvious alternative would be to use a Boost [`bimap`](http://www.boost.org/doc/libs/release/libs/bimap/doc/html/index.html) instead. — Jerry Coffin, Aug 11 '13 at 23:36

Jerry Coffin · Accepted Answer · 2013-08-12T14:41:48.263

5

Even though you have some good answers already, I decided to post another. I'm not sure it really adds a lot that's new, but I'm not at all sure it doesn't either. If nothing else, I'm pretty sure it uses more standard headers than any of the other answers. :-)

#include <vector>
#include <algorithm>
#include <unordered_map>
#include <map>
#include <iostream>
#include <utility>
#include <functional>
#include <numeric>

int main() {
    std::vector<int> inputs{ 1, 1, 1, 1, 2, 2, 2 };

    std::unordered_map<int, size_t> counts;
    for (int i : inputs)
        ++counts[i];

    std::multimap<size_t, int, std::greater<size_t> > inv;
    for (auto p : counts)
        inv.insert(std::make_pair(p.second, p.first));

    auto e = inv.upper_bound(inv.begin()->first);

    double sum = std::accumulate(inv.begin(),
        e,
        0.0,
        [](double a, std::pair<size_t, int> const &b) {return a + b.second; });

    std::cout << sum / std::distance(inv.begin(), e);
}

Compared to @Dietmar's answer, this should be faster if you have a lot of repetition in the numbers, but his will probably be faster if the numbers are mostly unique.

edited Aug 12 '13 at 14:41

answered Aug 12 '13 at 04:27

Jerry Coffin

476,176
80
629
1,111

Nice. One little improvement would be to replace the 2nd argument to `std::accumulate()` with the `e` you have already computed. – j_random_hacker Aug 12 '13 at 11:51
@JerryCoffin This is very impressive! Can you suggest any books on the standard library ? It seems like I can solve a lot of my problems if had knowledge of the tools you've used. The problem is that most books I've encountered read more like a reference manual than a tutorial. I need something that will let me practice these tools but also explain what class of problems these tools solve and when to use them. Let me know if you have anything in mind! – Amber Roxanna Aug 12 '13 at 20:07
Three books occur to me: *Effective STL* (Scott Meyers), *The C++ Standard Library: A Tutorial and Reference (2nd Edition)*, (Nicolai Josuttis) and *STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library (paperback) (2nd Edition)* (Musser, Saini and...some guy whose name I don't remember). Of those, Josuttis is the most reference-oriented, and Meyers probably the least. – Jerry Coffin Aug 12 '13 at 20:15

Dietmar Kühl · Answer 2 · 2013-08-12T00:05:51.027

4

Based on the comment, it seems you need to find the values which occur most often and if there are multiple values occurring the same amount of times, you need to produce the average of these. It seems, this can easily be done by std::sort() following by a traversal finding where values change and keeping a few running counts:

template <int Size>
double mode(int const (&x)[Size]) {
    std::vector<int> tmp(x, x + Size);
    std::sort(tmp.begin(), tmp.end());
    int    size(0);  // size of the largest set so far
    int    count(0); // number of largest sets
    double sum(0);    // sum of largest sets
    for (auto it(tmp.begin()); it != tmp.end(); ) {
        auto end(std::upper_bound(it, tmp.end(), *it));
        if (size == std::distance(it, end)) {
            sum += *it;
            ++count;
        }
        else if (size < std::distance(it, end)) {
            size = std::distance(it, end);
            sum = *it;
            count = 1;
        }
        it = end;
    }
    return sum / count;
}

edited Aug 12 '13 at 00:05

answered Aug 11 '13 at 23:41

Dietmar Kühl

150,225
13
225
380

I know it's terrible to think of it now, but you're really only using the upper bound, so `upper_bound` is probably a better fit. Sorry I didn't read more carefully the first time. – Jerry Coffin Aug 12 '13 at 00:03
@JerryCoffin: You are right and I should have noticed that myself. That said, the use of `std::find_if()` would yield a linear algorithm for the pass after the `std::sort()` while using `std::equal_range()` or `std::upper_bound()` result in `O(n log n)` worst case behavior. Of course, the `std::sort()` is already `O(n)` i.e. the overall complexity doesn't get worse. – Dietmar Kühl Aug 12 '13 at 00:07
The basic question is whether you expect to see an individual value repeated more than log(N) times on average. If it's repeated fewer than log(N) times, we can expect fewer comparisons using `find_if`. If it's more than log(N), we can expect fewer with `upper_bound`. – Jerry Coffin Aug 12 '13 at 00:14
I think you can make `upper_bound` linear overall (or something on that order) though. Each time you find the end of a range, you supply the next spot beyond that as the beginning of the next search. For each search, N decreases, so you're taking logarithms of a smaller number after each search. – Jerry Coffin Aug 12 '13 at 00:19

Felix Glas · Answer 3 · 2013-08-12T00:03:08.570

If you simply wish to count the number of occurences then I suggest you use a std::map or std::unordered_map.

If you're mapping a counter to each distinct value then it's an easy task to count occurences using std::map as each key can only be inserted once. To list the distinct numbers in your list simply iterate over the map.

Here's an example of how you could do it:

#include <cstddef>
#include <map>
#include <algorithm>
#include <iostream>

std::map<int, int> getOccurences(const int arr[], const std::size_t len) {
    std::map<int, int> m;
    for (std::size_t i = 0; i != len; ++i) {
        m[arr[i]]++;
    }
    return m;
}

int main() {
    int list[7]{1, 1, 1, 1, 2, 2, 2};
    auto occurences = getOccurences(list, 7);
    for (auto e : occurences) {
        std::cout << "Number " << e.first << " occurs ";
        std::cout << e.second << " times" << std::endl;
    }
    auto average = std::accumulate(std::begin(list), std::end(list), 0.0) / 7;
    std::cout << "Average is " << average << std::endl;
}

Output:

Number 1 occurs 4 times
Number 2 occurs 3 times
Average is 1.42857

score 1 · Answer 4 · answered Aug 11 '13 at 23:38

Here's a working version of your code. m stores the values in the array and q stores their counts. At the end it runs through all the values to get the maximal count, the sum of the modes, and the number of distinct modes.

float mode(int x[],int n)
{
    //Copy array and sort it
    int y[n], temp, j = 0, k = 0, m[n], q[n];

    for(int i = 0; i < n; i++)
        y[i] = x[i];

    for(int pass = 0; pass < n - 1; pass++)
        for(int pos = 0; pos < n; pos++)
            if(y[pass] > y[pos]) {
                temp = y[pass];
                y[pass] = y[pos];
                y[pos] = temp;
            }   

    for(int i = 0; i < n;){
        j = i;
        while (y[j] == y[i]) {
          j++;
        }   
        m[k] = y[i];
        q[k] = j - i;
        k++;
        i = j;
    }   

    int max = 0;
    int modes_count = 0;
    int modes_sum = 0;
    for (int i=0; i < k; i++) {
        if (q[i] > max) {
            max = q[i];
            modes_count = 1;
            modes_sum = m[i];
        } else if (q[i] == max) {
            modes_count += 1;
            modes_sum += m[i];
        }   
    }   

    return modes_sum / modes_count;
}

Algorithm to compute mode

4 Answers4

Linked