2

Say I have the following, in a toy DSL:

int foo(int bar = 0);

With a tool such as rust-peg, I could define some simple parser expression grammar (PEG) rules to match it (assume appropriate structs FnProto and 'Arg'):

function -> FnProto
  = t:type " " n:name "(" v:arglist ");"
  { FnProto { return_type:t, name:n, args:v } }

arglist -> Vec<Arg>
  = arg ** ","

arg -> Arg
  = t:type " " n:name " = " z:integer { Arg { typename:t, name:n, value:z } }

type -> String
  = "int" { match_str.to_string() }

name -> String
  = [a-zA-Z_]+[a-zA-Z0-9_] { match_str.to_string() }

integer -> i64
  = "-"? [0-9]+ { match_str.parse().unwrap() }

In practice such simple rules are insufficient, but they will serve to illustrate my point.

Now consider the following situation, where the default value of bar is a constant defined previously in the same file:

int BAZ = 0xDEADBEEF;

int foo(int bar = BAZ);

Now the rule for parsing functions needs to accept not only integer literals as default argument values, but also any previously declared constants.

I could do one pass to parse constants and substitute the appropriate values in a second pass, but do I really have to resort to two passes? Is there some way I can refer to previously parsed data from within a rule?

Iskar Jarak
  • 5,136
  • 4
  • 38
  • 60
  • While you could do this in one pass, you won't be able to use constants declared later than a function that uses them (if you wanted that), and probably if you make this any more complicated (custom types, whatever) you're going to have even more passes. I'd just use multiple passes. – U2EF1 Mar 15 '15 at 05:25
  • @U2EF1 I'm not concerned with constants declared after their first use. You make a good point about custom types and such, but could you demonstrate how you would do this in one pass? – Iskar Jarak Mar 15 '15 at 06:14

1 Answers1

4

You are confusing "parsing" (the recognition of a valid program, perhaps including capture of a representation of it [e.g, as an AST]) and semantic analysis and/or execution.

Your parser should define what is legal to say, syntactically, in the language. Nothing less, and nothing more. You might be able to write some programs that are semantic nonsense that the parser will not complain about.

Having parsed the text, you now need "other passes" over the parsed data (not the source text) to build classic compiler structures such as symbol tables, and to check that all uses of symbols are valid. To do those other passes, you could arguably reparse the text but you've done that already once by assumption. The standard solution here is to have the first parse build an abstract syntax tree (AST) representing the essential details of the program. Those "other passes" operate by walking the AST rather than parsing the source text again.

This is all classic and taught in standard compiler classes and books. If you are serious about building a programming language, you will need this background.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Very well, let me rephrase - _"how would I handle previously declared constants in a [one-pass compiler](http://en.wikipedia.org/wiki/One-pass_compiler) when parsing using a parser expression grammar"_? – Iskar Jarak Mar 15 '15 at 07:26
  • Depends on what you mean by "one pass", and what you want to do. The way I have defined it, there is only "one pass" of parsing. If you insist on *compiling* in one pass, you will have to build symbol tables as you parse, and you will have to interpret symbol that the parser encounters as it encounters them, to check that their usage is valid, and then you will have to generate code. It is pretty hard to do this well, this way. See http://stackoverflow.com/a/28970385/120163 It is especially hard to handle forward references; imagine your example invoked foo first, and declared bar later? – Ira Baxter Mar 15 '15 at 07:31
  • _"If you insist on compiling in one pass, you will have to build symbol tables as you parse, and you will have to interpret symbol that the parser encounters as it encounters them, to check that their usage is valid"_ That is exactly what I want to do, only I cannot see a way to do that while still parsing using a peg implementation, or at least not using rust-peg. I do not intend to implement forward references. – Iskar Jarak Mar 15 '15 at 07:47
  • 1
    I don't know a lot about peg parsers (except they backtrack) let alone rust-peg. In general decent parsers provide points in the parsing process where you can attach arbitrary actions ("procedural attachment"); you'd have to implement the semantic actions (symbol table capture, checking, code generation) using such procedural attachments. Here is were backtracking is NOT your friend: imagine you call a semantic action when some grammar rule triggers... then later, the parser backtracks across the semantic action. How are you going to undo it? For me this means "stay away from peg" – Ira Baxter Mar 15 '15 at 07:51