A table of 2 directions (string <-> token) in parsing

Question

I have defined a hash table keyword_table to store all the keywords of my language. Here is part of the code:

(* parser.mly *)
%token CALL CASE CLOSE CONST
...  
reserved_identifier:
| CALL { "Call" }
| CASE { "Case" }
| CLOSE { "Close" }
| CONST { "Const" }
...

(* lexer.mll *)
{let hash_table list =
  let tbl = Hashtbl.create (List.length list) in
  List.iter (fun (s, t) -> Hashtbl.add tbl (lowercase s) t) list;
  tbl

let keyword_table = hash_table [
  "Call", CALL; "Case", CASE; "Close", CLOSE; "Const", CONST;
  ... ]}

rule token = parse
  | lex_identifier as li
     { try Hashtbl.find keyword_table (lowercase li)  
       with Not_found -> IDENTIFIER li }

As there are a lot of keywords, I really would like to avoid as much as possible from repeat code.

In parser.mly, it seems that %token CALL CASE ... could not be simplified, because each token must be defined explicitly. However, for reserved_identifier part, is there possible to call a function to return a string from a token, instead of hard coding each string?

So, that suggests probably that a hash table is not suitable for this purpose. Which data structure is best choice for a search from both sides (we assume that each key from both sides is unique)? As a result, we want to realize find_0 table "Call" returns token CALL (used in lexer.mll) and find_1 table CALL returns "Call" (used in parser.mly).

Also, if this table can be defined, where should I put it so that parser.mly can use it?

score 2 · Accepted Answer · answered Oct 16 '13 at 12:47

It is possible to get the string while you are in the lexer, precisely at the point that is matching the token (eg. Lexing.lexeme). It is too late to try to get it in the parser. We don't want the token stream to keep all strings in memory as it would increase memory consumption a lot (in practice most tokens never need their string representation).

Why don't you build a reverse table from keyword values (or token values) to string, at the same time you build the first mapping ? hash_table could be renamed into hash_tables and return the two reverse maps.

It probably needs to be defined in the parser if you want it visible from both the parser and the lexer.

A table of 2 directions (string <-> token) in parsing

1 Answers1