4

I am trying to understand the running time of counting sort. In my notes, it says, assuming the size of the array A is n, and k is the number of times each number occurs,

 Counting-Sort(A,k) {
   for (i=1; i <= k; i++) // initialize number counters to 0
       times[i] = 0;

   for (j=1; j <= length[A]; j++) // decide how many times each
       times[A[j]]++;                  // number appears in the input

  // form the sorted array
  m=1;
    for ( i=1; i <= k; i++)    // consider each number in the range
         for ( j=1; j <= times[ i ]; j++) { // generate that number in 
            A[m]=i;                   // the output as many times as
            m++;                      // it occurs in the input
         }
  }

Let ti denote the number of times the inner loop iterates for each i. When we look at the nested for loops at the bottom of the code, note that, every time the inner loop is iterated, we are placing a new number into its correct place in the output array. Hence: sum ti (from i=1 to k) = n.

I don't understand why this sum is equal to n. The outer loop iterates k times, and inner loop may iterate at most n times, so it must be O(nk). Can somebody explain? Thanks

  • Create a small example array and populate `times` by hand or with the program and it should be clear that the inner-most loop will execute at most `n` times in total, not `nk` times. – Bernhard Barker Mar 21 '13 at 12:14
  • It actually takes 2n+k time, O(n+k), check: http://en.wikipedia.org/wiki/Counting_sort – Khaled.K Mar 22 '13 at 12:13

3 Answers3

3

Algorithm

Counting Sort, known also as Histogram Sort

n = length of input Array

k = number of unique symbols that appear in the input Array
  • Initialization takes k time

  • Counting takes n time

  • Enumeration takes Sum { Count(i) } = n time

Complexity

Time = k + n + n = 2n+k

Time ~ O(2n+k) = O(n+k)

Space = n + k ~ O(n+k)
Khaled.K
  • 5,828
  • 1
  • 33
  • 51
0

The notes aren't entirely correct. k is the highest number that shows up in your array (i.e. it has numbers from 1 to k). So, let's step through the algorithm:

  1. Initialize the k "bins": O(k)
  2. Count how often each number occurs: O(n)
    • The sum of the values in all bins is exactly n, because that's how many entries we had in total in the array
  3. Iterate over the bins: O(k)
    • Set just as many elements in the result array as the bin represents: seemingly O(n)

The important thing here is that we know that even though we iterate over k bins and there could, in general, be up to n values represented by each bin, we set up the bins so that, across all the iterations of the outer loop, the inner loop still runs for a total of n times. Hence, the total complexity of this step is actually O(n).

If we ignored our extra knowledge about the contents of the bins, you'd be absolutely right.

Now there's a final twist: if k > n, it's actually in O(k) rather than O(n)! Because now the outer loop is the thing that "runs more often".

I hope this makes some sense.

Jan Krüger
  • 17,870
  • 3
  • 59
  • 51
  • k can't be bigger than n, k is the size of times array, which store the histogram of unique values in the input array A, where n is the size of the input array A .. check: http://en.wikipedia.org/wiki/Counting_sort – Khaled.K Mar 22 '13 at 07:19
  • @KhaledAKhunaifer `k` can definitely be larger than `n`. Consider the array `[1, 2, 100]`. `n` is 3, but `k` is 100. – gsgx Nov 04 '13 at 05:48
  • @gsingh2011 `k` is not the maximum value in the array, it's number of unique values in the array – Khaled.K Nov 04 '13 at 05:55
  • @KhaledAKhunaifer From the link you posted: "the input to counting sort consists of a collection of n items, each of which has a non-negative integer key whose maximum value is at most k". This says that `k` defines the upper bound, or that the max value is `k`. In counting sort, you don't know how many unique elements there are. For `[1, 2, 100]` you would use an output array of size 100 (the maximum value), not 3 (the number of unique values). – gsgx Nov 04 '13 at 06:01
  • @gsingh2011 you're right, but also notice that `k` should not appear in Time complexity, unless we're accounting for space management .. since, we're processing the array, not the whole frequency list – Khaled.K Nov 04 '13 at 07:58
0

As you can see for the loop

m=1;
for ( i=1; i <= k; i++)    // consider each number in the range
     for ( j=1; j <= times[ i ]; j++) { // generate that number in 
        A[m]=i;                   // the output as many times as
        m++;                      // it occurs in the input
     }

its complexity will be equal to how many times the body of inner cycle is executed. When i = 1, it is executed times[1] times, when i = 2 => times[2] times, and so, when i = k, times[k] times. So in total inner body is executed times[1] + times[2] + ... + times[k] times, and that sum is equal to n, as each element is counted once.

artahian
  • 2,093
  • 14
  • 17