OCaml parsing function

Question

This is a function from a parser module. I have trouble understanding one line of code

 let rec e1 tokens =
   match tokens with
    Tokenizer.IfTok :: tokens1 -> 
      let (testAST, tokens2) = e1 tokens1
      in
      (match tokens2 with
          Tokenizer.ThenTok :: tokens3 ->
            let (thenAST, tokens4) = e1 tokens3
            in
            (match tokens4 with
                Tokenizer.ElseTok :: tokens5 ->
                  let (elseAST, tokens6) = e1 tokens5 
                  in
                  (If(testAST, thenAST, elseAST), tokens6)
              | _ -> raise (Syntax ("e1: missing else.")))
        | _ -> raise (Syntax ("e1: missing then.")))
  | _ -> e2 tokens

and e2 tokens = ........

I have no idea how this line works

let (testAST, tokens2) = e1 tokens1 in

I know it declares a local variable which is a tuple, but where does the value (testAST, tokens2) come from? It doesn't seem to have anything to do with tokens or tokens1. Also does this line only declares a tuple or it also calls the function? Thanks!

Ick, recursive descent parsing in this style (not using parser combinators) is yucky! The basic idea is that most of the time a recursive descent parser will return something like an `(AST,rest)` pair, where the `AST` says what was matched, and the `rest` is the "rest" of the token stream available. — Kristopher Micinski, Oct 29 '12 at 23:17

Dmytro Sirenko · Accepted Answer · 2012-10-29T23:47:04.163

Yes, this line does declare two variables and does call the function e1, binding the variables to result of the function call.

This way of binding variables is called pattern matching. It is based on information about the return type of function e1 - compiler knows it returns a tuple, and then it may be decomposed to parts, and these parts are bound to two new variables, testAST and tokens2 . It is one of most powerful features of FP, which allows you to write much more readable, flexible and brief code.

It may also be done (matched) on everything if the structure of that entity (pattern) is known to compiler (e.g. case classes in Scala, tuples and lists in Haskell, records in Erlang, etc). Also pattern matching may be used to ignore some parts of the structure that are not relevant for the conditions (e.g. in Haskell if you want to select the second item in three-tuple, just do selectSecond (_, a, _) = a, where _ is special symbol for ignoring values).

I understand what the line does now. But [`let (testAST, tokens2) = e1 tokens1 in`] just seems random to me. It just seems like it declares a tuple and somehow ocaml just knows that it's an AST and rest. — otchkcom, Oct 29 '12 at 23:40
@otchkcom yes, you're right, OCaml compiler analizes types of values and infers types using Hindley-Milner algorithm (it's how "somehow" works). If compiler knows that type of return value of function `e1` is tuple of types Type1 and Type2, it may declare two variable, first of type Type1 and second of Type2 and bind parts of tuple. — Dmytro Sirenko, Oct 29 '12 at 23:44

score 0 · Answer 2 · answered Oct 29 '12 at 23:00

0

It's calling a function named e1. In fact, this is the very function in which it appears; i.e., it's a recursive call to e1. The function returns a pair (a 2-tuple).

This looks like pretty standard recursive descent parsing.

answered Oct 29 '12 at 23:00

Jeffrey Scofield

65,646
2
72
108

thank you for your quick response, but what's in the tuple (testAST, tokens2)? Say if I have this expression if true then 7 else 8. It seems to make sense if I assume (testAst, tokens2) = (true, then 7 else 8), but I just don't understand how the function makes it so. – otchkcom Oct 29 '12 at 23:09
The function returns a pair consisting of an AST and remaining tokens. You can see this in the line that starts with `(If (testAST, ...`. Again, this is a standard way to do parsing functionally (tokens are passed along through the parser, not obtained globally). – Jeffrey Scofield Oct 29 '12 at 23:13
I see. But how does ocaml figure out that it's going to return an AST and remaining tokens? Sorry I'm new to functional programming, everything just seems different. – otchkcom Oct 29 '12 at 23:31
Type inference seems easy and difficult at the same time, very similar to recursion in fact. OCaml can deduce the types from the code itself. For example, the line I quoted above *has* to be returning an AST in the first part of the pair, because that's what `If` is. It *has* to be returning tokens in the second part because that's the type of `Tokenizer.ThenTok` and the other tokens. – Jeffrey Scofield Oct 29 '12 at 23:53

OCaml parsing function

2 Answers2