4

I am solving the Programming assinment for Harvard CS 51 programming course in ocaml. The problem is to define a function that can compress a list of chars to list of pairs where each pair contains a number of consequent occurencies of the character in the list and the character itself, i.e. after applying this function to the list ['a';'a';'a';'a';'a';'b';'b';'b';'c';'d';'d';'d';'d'] we should get the list of [(5,'a');(3,'b');(1,'c');(4,'d')]. I came up with the function that uses auxiliary function go to solve this problem:

let to_run_length (lst : char list) : (int*char) list =
  let rec go i s lst1 =
    match lst1 with
      | [] -> [(i,s)]
      | (x::xs) when s <> x ->  (i,s) :: go 0 x lst1
      | (x::xs) -> go (i + 1) s xs
        in match lst with
          | x :: xs -> go 0 x lst
          | [] -> []

My question is: Is it possible to define recursive function to_run_length with nested pattern matching without defining an auxiliary function go. How in this case we can store a state of counter of already passed elements?

Cœur
  • 37,241
  • 25
  • 195
  • 267

2 Answers2

6

The way you have implemented to_run_length is correct, readable and efficient. It is a good solution. (only nitpick: the indentation after in is wrong)

If you want to avoid the intermediary function, you must use the information present in the return from the recursive call instead. This can be described in a slightly more abstract way:

  • the run length encoding of the empty list is the empty list
  • the run length encoding of the list x::xs is,
    • if the run length encoding of xs start with x, then ...
    • if it doesn't, then (x,1) ::run length encoding of xs

(I intentionally do not provide source code to let you work the detail out, but unfortunately there is not much to hide with such relatively simple functions.)

Food for thought: You usually encounter this kind of techniques when considering tail-recursive and non-tail-recursive functions (what I've done resembles turning a tail-rec function in non-tail-rec form). In this particular case, your original function was not tail recursive. A function is tail-recursive when the flows of arguments/results only goes "down" the recursive calls (you return them, rather than reusing them to build a larger result). In my function, the flow of arguments/results only goes "up" the recursive calls (the calls have the least information possible, and all the code logic is done by inspecting the results). In your implementation, flows goes both "down" (the integer counter) and "up" (the encoded result).

Edit: upon request of the original poster, here is my solution:

let rec run_length = function
  | [] -> []
  | x::xs ->
    match run_length xs with
      | (n,y)::ys when x = y -> (n+1,x)::ys
      | res -> (1,x)::res
gasche
  • 31,259
  • 3
  • 78
  • 100
  • Thank you for your input. Your answer gives me a thought material that without changing the initial signature of the function I can't store any necessary information. So the option I probably do have (as Kakadu mentioned in his/her answer) without changing the initial signature of the function and without auxillary functions is to create global mutable variable that can be used like counter. This is not the type of solution I was seeking. However I have implemented the completely tail recursive version of this function and I think I am done with it now. Thanks. – user1767869 Oct 23 '12 at 12:36
  • @user1767869: my intuition is that you don't need to change the signature of the function, because the information you want is already present in the return value. Haven't tried it, though. – gasche Oct 23 '12 at 12:53
  • @user1767869: I just tried, and get it without any change to the interface, in 5 lines of code. I can give you my code if you want and think it's not against the guidelines for homework problems (I don't know). – gasche Oct 23 '12 at 13:18
  • Well you are probably right. But still you can return information but can not pass it to the function. It is possible to create a solution to the problem if there is a constraint on the maximum of consequent characters in the input. There is probably a solution that satisfies my constraints with code generation that generates explicit pattern matching code up to the maximum of consequent elemnts in the arbitrary input otherwise I can't come up yet with a solution. – user1767869 Oct 23 '12 at 13:19
  • well it is definetely not against any guidelines cause it's self education I just have found on the internet the reasonable course with assignments and trying to take as much as possible from it. I would like to see the solution. – user1767869 Oct 23 '12 at 13:21
  • Thank you very much! This is indeed a new concept for me to make a recursive call in matching expression so I am glad I have asked this question and get an answer. One more question is this coding style considered good? – user1767869 Oct 23 '12 at 19:12
  • @user1767869: yes, as it's also short and readable. You may argue for tail-recursive solutions for performance reasons, but then neither your or my code are tail-recursive, and it also have costs. – gasche Oct 24 '12 at 15:51
0

I don't think it is a good idea to write this function. Current solution is OK.

But if you still want to do it you can use one of two approaches.

1) Without changing arguments of your function. You can define some toplevel mutable values which will contain accumulators which are used in your auxilary function now.

2) You can add argument to your function to store some data. You can find some examples when googling for continuation-passing style.

Happy hacking!

P.S. I still want to underline that your current solution is OK and you don't need to improve it!

Kakadu
  • 2,837
  • 19
  • 29