0

My understanding is that in Lex/Bison, the lexical analysis is done by lex, the syntaxical by the rules of Bison, and the semantic one by the actions of Bison.

Is it then possible to go back from the semantic analysis, i.e. the actions, to the syntaxical one ?

One exemple would be that : suppose I want to be able to detect pseudo-C as "i++", "i=i+1","i=i+2". But I want that "i=i+1" to be reduce as "i++", and "i=i+2" to be a second rule. Is it the possible to do something like that :

identifier_plusplus: IDENTIFIER '+' '+'
add:                 IDENTIFIER '=' IDENTIFIER '+' NUMBER {if($1 == $3 && $5 == 1) REDUCE_IN(identifier_plusplus);}

Here, it is not very usefull, but in a case where I use identifier_plusplus in another rule, it could be very powerfull.

EDIT : An example where it can be usefull would be if I have another rule that catch For loops which increment one by one. I would like to type something as :

for_one:    FOR '(' IDENTIFIER '=' '0' ';' IDENTIFIER '<' CONST ';' IDENTIFIER PLUSPLUS ')' exprs 

With no care if I wrote i++ or i=i+1.

Is it more clear now ? (please excuse my english...)

Thank you in advance.

Ezriel_S
  • 248
  • 3
  • 11
  • 2
    What's the larger context here? Do you have another rule where `identifier_plusplus` is allowed and `add` is not, but you want to allow `add` there as well if and only if it has the form `var = var + 1`? Or do you just want to generate the same AST for `var++` and `var = var + 1`? In the latter case, you don't have to mess with the parser's state. The AST is built in the actions anyway. – sepp2k Dec 01 '19 at 14:27
  • I added an edit to show what would be the context and the usage of that. – Ezriel_S Dec 02 '19 at 10:16

1 Answers1

0

In general terms, you should avoid confusing syntax with semantics, both in the implementation of your compiler and in the design of your language.

The syntactic analysis of an input should suffice to build a parse tree from correct inputs. Semantic analysis such as the transformation suggested in your question can easily be performed by successive walks over the parse tree.

Separating syntax and semantics in this way produces code which is clearer and more maintainable, and makes it easier to introduce additional features such as new optimisations or static analyses. Also, it is notoriously difficult to generate useful error messages during a parse; it is much easier to accurately report errors detected during semantic analysis, such as type mismatches.

Having said all that, bison provides the possibility to generate a GLR parser, allowing a more flexible range of possibilities for resolving parsing conflicts. Since a grammar with two possible reductions for the same right-hand side necessarily has an ambiguity, you can use custom merge functions to select a reduction based on semantic values of components. Bison's GLR generator also includes an extension which allows productions to be guarded with a semantic predicate.

These advanced features can be used to advantage in languages whose syntax is unduly conditioned by semantic analysis, including languages which are typically parsed using some kind of "lexical feedback". (Ideally, languages would have been defined in ways which don't rely on such hacks, but language design is not always ideal.) In some cases, the GLR solution can simplify such parsers, but it's important to carefully consider the limitations, particularly the impact of deferred execution of parser actions.

rici
  • 234,347
  • 28
  • 237
  • 341