6

Some time ago, I decided to solve a simple task on HackerRank but using OCaml and Core, in order to learn them. In one of the tasks, I'm supposed to read data from standard input:

The first line contains an integer, denoting the number of entries in the phone book. Each of the subsequent lines describes an entry in the form of space-separated values on a single line. The first value is a friend's name, and the second value is an -digit phone number.

After the lines of phone book entries, there are an unknown number of lines of queries. Each line (query) contains a to look up, and you must continue reading lines until there is no more input.

The main issues:

  • I don't know how many lines there will be
  • Last line don't ends by newline, so I can't just read scanf "%s\n" until End_of_file

And my code became messy:

open Core.Std
open Printf
open Scanf


let read_numbers n =
    let phone_book = String.Table.create () ~size:n in

    for i = 0 to (n - 1) do
        match In_channel.input_line stdin with
        | Some line -> (
            match (String.split line ~on:' ') with
            | key :: data :: _ -> Hashtbl.set phone_book ~key ~data
            | _ -> failwith "This shouldn't happen"
        )
        | None -> failwith "This shouldn't happen"
    done;

    phone_book


let () =
    let rec loop phone_book =
        match In_channel.input_line stdin with
        | Some line -> (
            let s = match Hashtbl.find phone_book line with
                | Some number -> sprintf "%s=%s" line number
                | None -> "Not found"
            in
            printf "%s\n%!" s;
            loop phone_book
        )
        | None -> ()
    in

    match In_channel.input_line stdin with
    | Some n -> (
        let phone_book = read_numbers (int_of_string n) in
        loop phone_book
    )
    | None -> failwith "This shouldn't happen"

If I solve this task in Python, then code looks like this:

n = int(input())
book = dict([tuple(input().split(' ')) for _ in range(n)])

while True:
    try:
        name = input()
    except EOFError:
        break
    else:
        if name in book:
            print('{}={}'.format(name, book[name]))
        else:
            print('Not found')

This is shorter and clearer than the OCaml code. Any advice on how to improve my OCaml code? And there two important things: I don't want to abandon OCaml, I just want to learn it; second - I want to use Core because of the same reason.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171

2 Answers2

9

The direct implementation of the Python code in OCaml would look like this:

let exec name =
  In_channel.(with_file name ~f:input_lines) |> function
  | [] -> invalid_arg "Got empty file"
  | x :: xs ->
    let es,qs = List.split_n xs (Int.of_string x) in
    let es = List.map es ~f:(fun entry -> match String.split ~on:' ' entry with
        | [name; phone] -> name,phone
        | _ -> invalid_arg "bad entry format") in
    List.iter qs ~f:(fun name ->
        match List.Assoc.find es name with
        | None -> printf "Not found\n"
        | Some phone -> printf "%s=%s\n" name phone)

However, OCaml is not a script-language for writing small scripts and one shot prototypes. It is the language for writing real software, that must be readable, supportable, testable, and maintainable. That's why we have types, modules, and all the stuff. So, if I were writing a production quality program, that is responsible for working with such input, then it will look very differently.

The general style that I personally employ, when I'm writing a program in a functional language is to follow these two simple rules:

  1. When in doubt use more types.
  2. Have fun (lots of fun).

I.e., allocate a type for each concept in the program domain, and use lots of small function.

The following code is twice as big, but is more readable, maintainable, and robust.

So, first of all, let's type: the entry is simply a record. I used a string type to represent a phone for simplicity.

type entry = {
  name : string;
  phone : string;
}

The query is not specified in the task, so let's just stub it with a string:

type query = Q of string

Now our parser state. We have three possible states: the Start state, a state Entry n, where we're parsing entries with n entries left so far, and Query state, when we're parsing queries.

type state =
  | Start
  | Entry of int
  | Query

Now we need to write a function for each state, but first of all, let's define an error handling policy. For a simple program, I would suggest just to fail on a parser error. We will call a function named expect when our expectations fail:

let expect what got =
  failwithf "Parser error: expected %s got %s\n" what got ()

Now the three parsing functions:

let parse_query s = Q s

let parse_entry s line = match String.split ~on:' ' line with
  | [name;phone] ->  {name;phone}
  | _ -> expect "<name> <phone>" line

let parse_expected s =
  try int_of_string s with exn ->
    expect "<number-of-entries>" s

Now let's write the parser:

let parse (es,qs,state) input = match state with
  | Start -> es,qs,Entry (parse_expected input)
  | Entry 0 -> es,qs,Query
  | Entry n -> parse_entry input :: es,qs,Entry (n-1)
  | Query -> es, parse_query input :: qs,Query

And finally, let's read data from file:

let of_file name =
  let es,qs,state =
    In_channel.with_file name ~f:(fun ch ->
      In_channel.fold_lines ch ~init:([],[],Start) ~f:parse) in
  match state with
  | Entry 0 | Query -> ()
  | Start -> expect "<number-of-entries><br>..." "<empty>"
  | Entry n -> expect (sprintf "%d entries" n) "fewer"

We also check that our state machine reached a proper finish state, that is it is either in Query or Entry 0 state.

ivg
  • 34,431
  • 2
  • 35
  • 63
6

As in Python, the key to a concise implementation is to let the standard library do most of the work; the following code uses Sequence.fold in lieu of Python's list comprehension. Also, using Pervasives.input_line rather than In_channel.input_line allows you to cut down on extraneous pattern matching (it will report an end of file condition as an exception rather than a None result).

open Core.Std

module Dict = Map.Make(String)

let n = int_of_string (input_line stdin)
let d = Sequence.fold
  (Sequence.range 0 n)
  ~init:Dict.empty
  ~f:(fun d _ -> let line = input_line stdin in
      Scanf.sscanf line "%s %s" (fun k v -> Dict.add d ~key:k ~data:v))

let () =
  try while true do
    let name = input_line stdin in
    match Dict.find d name with
    | Some number -> Printf.printf "%s=%s\n" name number
    | None -> Printf.printf "Not found.\n"
  done with End_of_file -> ()
Reimer Behrends
  • 8,600
  • 15
  • 19
  • Nit, but I think `Sequence` is not in the standard library, but in Core. – antron Nov 24 '16 at 17:24
  • The OP specifically stated that he wanted to keep using Core, which is why I based my code on that (I normally don't use Core much). – Reimer Behrends Nov 24 '16 at 20:36
  • Yes, understood, no objection to that. All I meant is, while reading the first sentence, I felt a vague suggestion that `Sequence` is in stdlib, which might confuse some other readers later on. I realize it doesn't directly claim that, it's just somehow slightly confusing. – antron Nov 25 '16 at 15:02
  • It's probably just a terminology thing. Core, Batteries, ExtLib, and Containers are all commonly described as alternative standard libraries, so that's how I tend to use the term for OCaml. I realize now that this may be confusing if you think that "standard library" refers to *the* OCaml standard library. – Reimer Behrends Nov 25 '16 at 15:15