External definitions for ocamllex regular expressions

Question

I have implemented the usual combination of lexer/parser/pretty-printer for reading-in/printing a type in my code. I find there is redundancy among the lexer and the pretty-printer when it comes to plain-string regular expressions, usually employed for symbols, punctuation or separators.

For example I now have

rule token = parse
  | "|-" { TURNSTILE }

in my lexer.mll file, and a function like:

let pp fmt (l,r) = 
  Format.fprintf fmt "@[%a |-@ %a@]" Form.pp l Form.pp r

for pretty-printing. If I decide to change the string for TURNSTILE, I have to edit two places in the code, which I find less than ideal.

Apparently, the OCaml lexer supports a certain ability to define regular expressions and then refer to them within the mll file. So lexer.mll could be written as

let symb_turnstile = "|-"

rule token = parse
  | symb_turnstile { TURNSTILE }

But this will not let me externally access symb_turnstile, say from my pretty-printing functions. In fact, after running ocamllex, there are no occurences of symb_turnstile in lexer.ml. I cannot even refer to these identifiers in the OCaml epilogue of lexer.mll.

Is there any way of achieving this?

Nikos · Accepted Answer · 2012-08-05T11:15:48.780

In the end, I went for the following style which I stole from the sources of ocamllex itself (so I am guessing it's standard practice). A map from strings to tokens (here an association list) is defined in the preamble of lexer.mll

let symbols =
  [ 
    ...
    (Symb.turnstile, TURNSTILE); 
    ...
  ]

where Symb is a module defining turnstile as a string. Then, the lexing part of lexer.mll is purposely overly general:

rule token = parse
  ...
  | punctuation
    {
      try 
        List.assoc (Lexing.lexeme lexbuf) symbols
      with Not_found -> lex_error lexbuf  
    }
  ...

where punctuation is a regular expression matching a sequence of symbols.

The pretty-printer can now be written like this.

let pp fmt (l,r) = 
  Format.fprintf fmt "@[%a %s@ %a@]" Form.pp Symb.turnstile l Form.pp r

score 1 · Answer 2 · answered Aug 03 '12 at 15:31

Although the two tokens both look like strings notationally, they're really very different. I don't think there's a convenient type under which they could be shared for use by ocamllex and Printf.printf. This is possibly the reason that ocamllex doesn't support such external definitions. You could get probably the effect you want with a macro facility (textual inclusion).

External definitions for ocamllex regular expressions

2 Answers2