9

LINQ library in .NET framework does have a very useful function called GroupBy, which I have been using all the time. Its type in Haskell would look like

Ord b => (a-> b) -> [a] -> [(b, [a])]

Its purpose is to classify items based on the given classification function f into buckets, with each bucket containing similar items, that is (b, l) such that for any item x in l, f x == b.

Its performance in .NET is O(N) because it uses hash-tables, but in Haskell I am OK with O(N*log(N)).

I can't find anything similar in standard Haskell libraries. Also, my implementation in terms of standard functions is somewhat bulky:

myGroupBy :: Ord k => (a -> k) -> [a] -> [(k, [a])]
myGroupBy f = map toFst
        . groupBy ((==) `on` fst) 
        . sortBy (comparing fst) 
        . map (\a -> (f a, a))
    where
        toFst l@((k,_):_) = (k, map snd l)

This is definitely not something I want to see amongst my problem-specific code.

My question is: how can I implement this function nicely exploiting standard libraries to their maximum?

Also, the seeming absence of such a standard function hints that it may rarely be needed by experienced Haskellers because they may know some better way. Is that true? What can be used to implement similar functionality in a better way?

Also, what would be the good name for it, considering groupBy is already taken? :)

Rotsor
  • 13,655
  • 6
  • 43
  • 57

2 Answers2

7

GHC.Exts.groupWith

groupWith :: Ord b => (a -> b) -> [a] -> [[a]]

Introduced as part of generalised list comprehensions: http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/syntax-extns.html#generalised-list-comprehensions

sclv
  • 38,665
  • 7
  • 99
  • 204
  • 2
    ...note, also, that the paper describing the extension cites LINQ as an inspiration behind it, and that LINQ itself was influenced heavily by Haskell. Round and round it goes! – C. A. McCann May 21 '11 at 04:23
  • 1
    Nice! They forgot to return `b` though. :) – Rotsor May 21 '11 at 04:27
3

Using Data.Map as the intermediate structure:

import Control.Arrow ((&&&))
import qualified Data.Map as M

myGroupBy f = M.toList . M.fromListWith (++) . map (f &&& return)

The map operation turns the input list into a list of keys paired with singleton lists containing the elements. M.fromListWith (++) turns this into a Data.Map, concatenating when two items have the same key, and M.toList gets the pairs back out again.

Note that this reverses the lists, so adjust for that if necessary. It is also easy to replace return and (++) with other monoid-like operations if you for example only wanted the sum of the elements in each group.

hammar
  • 138,522
  • 17
  • 304
  • 385
  • This works! However, there is a slight difference: it reverses the resulting lists. Nice use of arrows btw. They are even starting to make sense to me! – Rotsor May 21 '11 at 04:21
  • Ah, yes. You can remedy that by using `flip (++)` or just postprocess with `map (second reverse)`, another example of arrows :) The latter may be more efficient as you avoid the possible O(n^2) from the list concatenations. – hammar May 21 '11 at 04:27
  • @Rotsor: You can always reverse the result lists after the fact if nothing else. This is also likely to be faster than the `Data.List` version you have, because `Data.Map` sorts as it goes, whereas `groupBy` has only an equality predicate (think about the implications of that...). – C. A. McCann May 21 '11 at 04:28
  • `second`... I am happy! The code I just wrote contained `mapSnd f (a,b) = (a, f b)`. `flip (++)` -- this is a no-go. O(N^2) is not just possible, but seems to be guaranteed this way. – Rotsor May 21 '11 at 04:32
  • @camccann, I thought groupBy has a linear time complexity. Am I wrong here? – Rotsor May 21 '11 at 04:33
  • @Rotsor: True, but whether it's a problem or not depends on how many collisions you expect. But for most uses of groupBy you probably expect a lot. – hammar May 21 '11 at 04:40
  • We can replace `return` with `(:)`, `(++)` with `flip (.)` and post-process it with `map (second ($[]))` to get the original order without the need for list reversal and without the fear of O(N^2). I wonder if it will be slower because of all the functions being passed around. After that, we can replace `(:)` and `[]` as needed to get the folding behaviour. – Rotsor May 21 '11 at 05:21
  • You want to use (flip (++)) anyway because otherwise you're tacking on new elements at the end, which has bad complexity. – augustss May 21 '11 at 09:37
  • @augustss: The arguments to the combining function are in the opposite order of what you might expect, so `(++)` prepends, while `flip (++)` gives the bad complexity. – hammar May 21 '11 at 09:46