Explain and Clarify Haskell Counting Sort

Question

I am working through Cormen et. al., Introduction to Algorithms, 3rd ed., but I also have an interest in Haskell. Section 8.2 (p. 194) covers counting sort. I was interested in how it and many algorithms might be implemented in haskell as they often use array access and destructive update. I took a look at the implementation on RosettaCode (copied below) and I find it very difficult to follow.

import Data.Array

countingSort :: (Ix n) => [n] -> n -> n -> [n]
countingSort l lo hi = concatMap (uncurry $ flip replicate) count
  where count = assocs . accumArray (+) 0 (lo, hi) . map (\i -> (i, 1)) $ l

One of the things I like about haskell is how algorithms can be very clear (e.g. Haskell quicksort examples), at least as a specification that hasn't been optimized. This seems very unclear and I wonder if it's necessarily so or just overdone.

Can someone

clarify what's going on here,
perhaps provide a more instructive and clear implementation, and
tackle whether this is actually implementing counting sort or if non-strictness (lazyness) and immutability mean that this is actually some other sort disguised as counting sort?

user94559 · Accepted Answer · 2016-10-01T20:38:16.990

It is indeed doing a counting sort. Here's a slightly rewritten version that I find easier to understand:

import Data.Array

countingSort :: (Ix n) => [n] -> n -> n -> [n]
countingSort l lo hi = concat [replicate times n | (n, times) <- counts]
  where counts = assocs (accumArray (+) 0 (lo, hi) [(i, 1) | i <- l])

Let's break it down step by step. We'll use the list [5, 3, 1, 2, 3, 4, 5].

*Main> [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
[(5,1),(3,1),(1,1),(2,1),(3,1),(4,1),(5,1)]

We're just taking every element of the list and turning it into a tuple with 1. This is the basis for our counts. Now we need a way to sum up those counts per element. This is where accumArray comes into play.

*Main> accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
array (1,5) [(1,1),(2,1),(3,2),(4,1),(5,2)]

The first parameter to accumArray is the operation to apply during accumulation (just simple addition for us). The second parameter is the starting value, and the third parameter is the bounds. So we end up with an array mapping numbers to their counts in the input.

Next we use assocs to get key/value tuples from the map:

*Main> assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]
[(1,1),(2,1),(3,2),(4,1),(5,2)]

And then replicate to repeat each number based on its count:

*Main> [replicate times n | (n, times) <- assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]]
[[1],[2],[3,3],[4],[5,5]]

Finally, we use concat to turn this list of lists into a single list:

*Main> concat [replicate times n | (n, times) <- assocs $ accumArray (+) 0 (1, 5) [(i, 1) | i <- [5, 3, 1, 2, 3, 4, 5]]]
[1,2,3,3,4,5,5]

In the actual function I wrote above, I used where to break up this one-liner.

I find list comprehension to be easier to deal with than map and concatMap, so those are the main changes I made in my version of the function.

(uncurry $ flip replicate) is a nice trick... flip replicate gives you a version of replicate that takes its arguments in the opposite order. uncurry takes that curried function and turns it into a function that takes a tuple as an argument instead. Together, those produce the same result as my list comprehension, which destructured the tuple and then passed its parameters in reverse order. I'm not familiar enough with Haskell to know if this is a common idiom, but for me, the list comprehension was easier to follow.

You could also skip the `concat` with `[occurrence | (n, times) <- counts, occurrence <- replicate times n]`. Oh, and `uncurry (flip . replicate)` is horrible, IMO. — dfeuer, Oct 01 '16 at 21:31
I accepted because this appears to be an excellent answer, however it is still not clear to me that this is actually doing counting sort (in the style of the CLRS book). I had hoped to give this the time it deserves, but I don't think that's going to happen any time soon. As I understand it, counting sort (CLRS-style) relies on arrays with numeric indexes to *avoid comparisons* and do the sort just by counting. Without having dug into the implementation of accumArray and Data.Array, it looks like accumArray is doing some kind of sort with comparisons (and thus not a counting sort). — joeA, Nov 13 '16 at 20:18
@joeA I'd be very surprised if `accumArray` did any comparisons... why would it? It should basically be "for each tuple (x,y): add x to element y of the array." That's just indexing into an array and doing addition. — user94559, Nov 14 '16 at 19:52
@smarx Perhaps you're right. I don't have a good understanding of Array in haskell and I naively think of them as just lists of pairs, and I thought of accumArray as manipulating the (unordered) list of pairs that is its argument. — joeA, Nov 14 '16 at 23:54
@smarx Having now read [A Gentle Introduction to Haskell/Arrays](https://www.haskell.org/tutorial/arrays.html) and [How are mutable arrays implemented in Haskell?](http://stackoverflow.com/questions/5773070/how-are-mutable-arrays-implemented-in-haskell) (to the degree that I understand it) I guess I should just think of haskell's Data.Array as good old imperative-style arrays. Thanks for the answer. — joeA, Nov 16 '16 at 02:24

score 3 · Answer 2 · edited Oct 01 '16 at 20:14

count through accumArray works as histogram builder. That is, for each number from lo to hi it returns how many times the number occurs in the argument list.

count ::  (Ix i, Num e) => [i] -> i -> i -> [(i, e)]
count l lo hi = assocs . accumArray (+) 0 (lo, hi) . map (\i -> (i, 1)) $ l

count [6,2,1,6] 0 10 ==  
   [(0,0),(1,1),(2,1),(3,0),(4,0),(5,0),(6,2),(7,0),(8,0),(9,0),(10,0)]

Results of count are used to generate original elements back from this specification. That is done by replicateing each fst element of tuple snd number of times . This yields list of lists that are concatenated together.

 f l lo hi = map (uncurry $ flip replicate ) $ count l lo hi
 f [6,2,1,6] 0 10 == [[],[1],[2],[],[],[],[6,6],[],[],[],[]]

Full solution is equivalent to

countingSort l lo hi = concat $ f l lo hi

Explain and Clarify Haskell Counting Sort

2 Answers2