Haskell Merge Sort

Question

This is an implementation of Mergesort using higher order functions,guards,where and recursion.

However getting an error from compiler 6:26: parse error on input ‘=’

 mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
    mergeSort merge xs
        | length xs < 2 = xs
        | otherwise = merge (mergeSort merge first) (mergeSort merge second)
        where first = take half xs 
              second = drop half xs 
               half = (length xs) `div` 2

I can't see whats wrong? or rather I don't understand the compiler.

btw - if you are using length this is an O(n) operation - so you are producing unnecessary overhead, in general if you are using length indexing a lot with (single-linked) lists you are probably using the wrong data structure. — epsilonhalbe, May 07 '16 at 00:09

L.-T. Chen · Answer 1 · 2019-07-17T13:06:21.977

11

Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:

sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[]) 
  where
    mergeAll [] = []
    mergeAll [t] = t
    mergeAll xs  = mergeAll (mergePairs xs)

    mergePairs (x:y:xs) = merge x y:mergePairs xs
    mergePairs xs = xs

where merge is already given by others.

edited Jul 17 '19 at 13:06

answered Oct 19 '18 at 16:06

L.-T. Chen

381
4
4

A small but nice optimization: instead of breaking the list up into singletons, break it up into sorted lists of two elements. That saves O(n) allocation without much extra code complexity. – dfeuer Jul 17 '19 at 13:51

Redu · Answer 2 · 2019-05-17T13:27:32.487

10

Another msort implementation in Haskell;

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys         = ys
merge xs []         = xs
merge (x:xs) (y:ys) | x < y     = x:merge xs (y:ys)
                    | otherwise = y:merge (x:xs) ys

halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
           where lhx = length xs `div` 2

msort :: Ord a => [a] -> [a]
msort []  = []
msort [x] = [x]
msort  xs = merge (msort left) (msort right)
            where (left,right) = halve xs

edited May 17 '19 at 13:27

answered Mar 21 '17 at 14:41

Redu

25,060
6
56
76

Shouldn't merge be a tail recursive function for efficiency reasons? I haven't benchmarked it but when just running it in repl.it with a list of 30000 values, tail recursion is a bit faster: https://repl.it/@faebl/Sorting – Fabian Schneider Apr 02 '19 at 08:33
1

the msort is missing the equation "msort [] = []". Otherwise the thing will never stop if the array to sort is empty. – Cuban coffee May 17 '19 at 13:21
@Cuban coffee Thanks for the heads up... corrected. – Redu May 17 '19 at 13:28
2

@FabianSchneider, no, it should not. Haskell lists are lazy in both their elements and (what matters here) their tails. So constructing a Haskell list the way the above code does is efficient: the list constructor is allocated *first*, and only later is the tail evaluated. – dfeuer May 17 '19 at 15:58
2

@dfeuer thanks for the info; I thought about it and read a a bit more about laziness and the list constructor and it makes kind of sense. Also "benchmarking" it on repl.it isn't really a reliable thing... – Fabian Schneider May 18 '19 at 19:28
It's likely more efficient to halve a list using a two-pointer algorithm than to take the length and divide by two. And it's definitely prettier. – dfeuer Jul 17 '19 at 13:55
Here's a [two-pointer list halver](https://tio.run/##XU1LCsIwEN3nFI/iIgXxAIUsimtPIFKCRjuaJjGJbT19nCpuuhhm3rxfr9PDWFsKDcHHjL13OXq7a0OwdNaZRiNW1ME7fVk/2xj9BHmlmHItRK/taKBw9@Rw8wKYehMNbzCEnJs51ZAdGizzZqDwNS9cjc2imhMTfwuDjjXyeNryzRWDJqcQIrnM8l9h9eSEV1XKBw). – dfeuer Jul 17 '19 at 14:10

score 9 · Answer 3 · answered May 07 '16 at 00:06

9

Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).

mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
        | length xs < 2 = xs
        | otherwise = merge (mergeSort merge first) (mergeSort merge second)
        where first = take half xs 
              second = drop half xs 
              half = length xs `div` 2

answered May 07 '16 at 00:06

epsilonhalbe

15,637
5
46
74

that seems to have fixed that error but now when I run the function mergeSort, it says merge not in scope. Does that mean, the code is incomplete and I need to further define what merge actually does? – emuterisa May 07 '16 at 00:19
2

@emuterisa this should be an accepted answer if it solved your problem; even if this is from three years ago... – Fabian Schneider May 18 '19 at 19:30

score 3 · Answer 4 · answered Aug 23 '21 at 07:52

None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.

Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:

mergesort :: Ord a => [a] -> [a] 
mergesort xs = unwrap (until single (pairWith merge) (runs xs)) 

runs :: Ord a => [a] -> [[a]]
runs = foldr op [] 
 where op x [] = [[x]]
       op x ((y:xs):xss) | x <= y    = (x:y:xs):xss
                         | otherwise = [x]:(y:xs):xss`

This will run Theta(n)

Haskell's version is smarter still because it will do an up run and a down run.

As usual I am in awe with the cleverness of Haskell!

The version in `Data.List` is indeed clever as a general purpose sort. But not every application will expect elements to be partially sorted already. In such applications, it will generally be faster *not* to try to detect runs of sorted elements. Additionally, there are quite a few applications where it's more important for running time to be *predictable* than for it to be *fast*. In these applications too, it's better not to take advantage of sorted runs. — dfeuer, Nov 14 '21 at 20:17

Haskell Merge Sort

4 Answers4

Linked