1

I am creating a compiler and am trying to extract line information from the parser. I wish to attach this to the AST node as metadata so that any error at a later point can be reported easily. I was successfully able to extract the line information in the Lexer by using this:

exception LexErr of string
exception ParseErr of string

let error msg start finish  = 
    Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum 
      (start.pos_cnum -start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg

let lex_error lexbuf = 
    raise ( LexErr (error (lexeme lexbuf) (lexeme_start_p lexbuf) (lexeme_end_p lexbuf)))

This generates the line number, char number for Lexer perfectly after using it in this manner:

rule read = parse
(* Lexing tokens *)
| _ { lex_error lexbuf }

For parser, I am using this method:

exception LexErr of string
exception ParseErr of string

let error msg start finish = 
    Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum 
      (start.pos_cnum -start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg

let parse_error msg nterm =
    raise (ParseErr (error msg (rhs_start_pos nterm) (rhs_end_pos nterm)))

My parser looks like this:

%start <Ast.stmt> program

%%

program:
  | s = stmt; EOF { s }
  ;

stmt:
  | TINT; e = expr { Decl(e) }
  | e1 = expr; EQUALS; e2 = expr { Assign(e1,e2) }
  | error             { parse_error "wsorword" 1 }
  ;

expr:
  | i = INT; { Const i }
  | x = ID { Var x }
  | e1 = expr; b = binop; e2 = expr; { Binop(e1,b,e2) }
  ;

binop:
  | SUM { Sum }
  | SUB { Sub }
  | MUL { Mul }
  | DIV { Div }
  ;

On running this, if a parser error is detected, it throws the invalid_argument "Index out of bounds" exception. This is detected on raise (ParseErr (error msg (rhs_start_pos nterm) (rhs_end_pos nterm))) line. I would ultimately like to create an AST node which contains this parser line information as it's metadata but can't get through this exception. I am not sure if my method of implementation is wrong or if I'm making some other mistake. Would love some help on this.

pleasehalp
  • 136
  • 7
  • I just did some quick googling, but it seems to me that your error rule has only a 0th symbol. There is no symbol number 1. Possibly the problem is that you're passing 1 to `rhs_start_pos` and `rhs_end_pos`. – Jeffrey Scofield May 08 '18 at 07:48
  • I was referring to ocamlyacc manual. Over there, it says that Parsing.rhs_start_pos n where n is 1 for the leftmost item and the first char in a file is at offset 0. Besides, I tried passing it 0 as an argument, it doesn't work. Unless you meant something else? @JeffreyScofield – pleasehalp May 08 '18 at 08:34

1 Answers1

1

The function rhs_start_pos nth can not be used with menhir parsers; in this case, you should use $symbolstartpos or $startpos.

Similarly, e = expr is not valid with ocamlyacc.

Thus, I am not sure which parser generator you are trying to use.

octachron
  • 17,178
  • 2
  • 16
  • 23
  • I am using menhir. I assumed that the Locations module would work with it. Could you point me towards some resources or sample library I can refer to to understand it? I went through the menhir documentation, but I need a little more examples to work my way through. – pleasehalp May 09 '18 at 17:06
  • The Location module works, it is the Parsing module that is specific to ocamlyacc. A good but lengthy example might be the reason parser, https://github.com/facebook/reason/blob/master/src/reason-parser/reason_parser.mly . Notably, it uses both $symbolstart and friends and parametric rule (https://github.com/facebook/reason/blob/master/src/reason-parser/reason_parser.mly#L4637) to insert location at the right place. – octachron May 09 '18 at 17:37