Tail-recursive merge sort in OCaml

Question

I’m trying to implement a tail-recursive list-sorting function in OCaml, and I’ve come up with the following code:

let tailrec_merge_sort l =
  let split l = 
    let rec _split source left right =
      match source with
        | [] -> (left, right)
        | head :: tail -> _split tail right (head :: left) 
    in _split l [] []
  in

  let merge l1 l2 = 
    let rec _merge l1 l2 result =
      match l1, l2 with
        | [], [] -> result
        | [], h :: t | h :: t, [] -> _merge [] t (h :: result)
        | h1 :: t1, h2 :: t2 ->
            if h1 < h2 then _merge t1 l2 (h1 :: result)
            else            _merge l1 t2 (h2 :: result)
    in List.rev (_merge l1 l2 [])
  in

  let rec sort = function
    | [] -> []
    | [a] -> [a]
    | list -> let left, right = split list in merge (sort left) (sort right)
  in sort l
;;

Yet it seems that it is not actually tail-recursive, since I encounter a "Stack overflow during evaluation (looping recursion?)" error.

Could you please help me spot the non tail-recursive call in this code? I've searched quite a lot, without finding it. Cout it be the let binding in the sort function?

Norman Ramsey · Answer 1 · 2010-03-27T22:26:00.300

9

Merge sort is inherently not tail-recursive. A sort requires two recursive calls, and in any execution of any function, at most one dynamic call can be in tail position. (split is also called from non-tail position, but since it should use constant stack space that should be OK).

Be sure you are using a recent version of OCaml. In versions 3.08 and older, List.rev could blow the stack. This problem is fixed in version 3.10. Using version 3.10.2, I can sort a list of ten million elements using your code. It takes a couple of minutes, but I don't blow the stack. So I'm hoping your problem is simply that you have an old version of OCaml.

If not, a good next step would be to set the environment variable

OCAMLRUNPARAM=b=1

which will give a stack trace when you blow the stack.

I'd also like to know the length of the arrays you are sorting; although merge sort cannot be tail-recursive, its non-tail nature should cost you logarithmic stack space.

If you need to sort more than 10 million elements, maybe you should be looking at a different algorithm? Or if you want, you could CPS-convert mergesort by hand, which will yield a tail-recursive version at the cost of allocating contiuations on the heap. But my guess is that it won't be necessary.

edited Mar 27 '10 at 22:26

answered Mar 27 '10 at 22:12

Norman Ramsey

198,648
61
360
533

Hmmm, since split is not in last position, does it count? (I mean, as I understand it, the compiler should be able to detect a tail-recursive function and convert it into a loop ; then, only the last call would matter) Furthermore, using continuations should make the function tail-recursive, shouldn't it? – Clément Mar 29 '10 at 18:17
I'm using OCaml v11.0, and I blow the stack when running my code on 10^6 elements. I need to sort between 5 and 10 million elements. – Clément Mar 29 '10 at 18:19
Finally, my problem is that even using continuations I do blow the stack. Any idea why? – Clément Mar 29 '10 at 18:20
By the way, is there a way to get a backtrace when running toplevel programs? – Clément Mar 29 '10 at 18:34
@CFP: Setting the environment variable should get you a backtrace. If not, just compile your test case and run it as a standalone binary. – Norman Ramsey Mar 29 '10 at 19:59

Brian · Accepted Answer · 2010-03-27T15:52:17.247

7

The expression

merge (sort left) (sort right)

is not tail-recursive; it calls both (sort left) and (sort right) recursively while there is remaining work in the call (merge).

I think you can fix it with an extra continuation parameter:

  let rec sort l k =
    match l with
    | [] -> k [] 
    | [a] -> k [a] 
    | list -> let left, right = split list in sort left (fun leftR -> sort right (fun rightR -> k (merge leftR rightR)))
  in sort l (fun x -> x)

edited Mar 27 '10 at 15:52

answered Mar 27 '10 at 14:33

Brian

117,631
17
236
300

Oh, I think I understand; thanks! But then, how can I make my function recursive? – Clément Mar 27 '10 at 14:50
2

Could you explain why continuations do actually make the function tail-recursive? Or do they just move the process of capturing the stack-frame from the (potentially overflowing) stack to the generated closure? – Dario Mar 27 '10 at 16:09
Hmmm, I guess this should work, but I don't really understand what the k function do. Could you please explain it a little? Thanks a lot! I've tested it though, but it doesn't solve the overflow problem... Any idea why? – Clément Mar 27 '10 at 16:56
I didn't test the code, so I may have made a mistake. :) The best explanation of the continuation strategy I have is here: http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!170.entry – Brian Mar 27 '10 at 18:16
It's a bug with the tail-rec optimization in Caml I guess. Anyway the doc. is excellent, thanks a lot! – Clément Apr 01 '10 at 16:56

Tail-recursive merge sort in OCaml

2 Answers2

Linked