2

I am writing a static analyser including a frontend for programs in a specific language.

The front-end succeeds to generate an AST from a program, on which the analyser works well: it either proves the program correct (for some specific properties), or raise an error for a statement or an expression.

In case of error, I would like to make the error message explicit. Thus I want to add the exact localisation in the source code for the statement or the expression where an error is raised. Showing the line number will be already good, showing row number will be even better...

Could anyone tell me how to modify the frontend to do this? Or is there any document I could study?

(I guess first i need to modify the types in AST, but do I have to add loc to everything?)

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
SoftTimur
  • 5,630
  • 38
  • 140
  • 292
  • Localisation or source code locations? The former means translating into other languages and generally tailoring the program for the user's geographic location. Your description sounds like the latter. –  Apr 16 '14 at 15:57
  • I mean source code location... – SoftTimur Apr 16 '14 at 23:07

1 Answers1

2

Conceptually (and my preference implementationally) is add source code location (line, column, file) to every AST node.

This shouldn't technically be hard in the nodes, and you can collect that information in the lexer so that lexemes processed by the parser carry that information. Getting the parser to copy that information into the parse tree nodes should be easy. We've done this for our program analysis tools and it works quite well.

You'll find it annoying to get the line numbers right, because people use inconsistent end-of-line conventions (0X0D, 0X0A, Unicode NEL, Unicode line break, ...). And, you may find that your line numbering convention has to follow that of tools you didn't write. (GCC has its own ideas about what increments the line number).

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341