Currently I'm trying to build a parser for VHDL which has some of the problems C++-Parsers have to face. The context-free grammar of VHDL produces a parse forest rather than a single parse tree because of it's ambiguity regarding function calls and array subscriptions
foo := fun(3) + arr(5);
This assignment can only be parsed unambiguous if the parser would carry around a hirachically, type-aware symbol table which it'd use to resolve the ambiguities somewhat on-the-fly.
I don't want to do this, because for statements like the aforementioned, the parse forest would not grow exponentially, but rather linear depending on the amount of function calls and array subscriptions.
(Except, of course, one would torture the parser with statements like)
foo := fun(fun(fun(fun(fun(4)))));
Since bison forces the user to just create one single parse-tree, I used %merge attributes to collect all subtrees recursively and added those subtrees under so called AMBIG nodes in the singleton AST.
The result looks like this.
In order to produce the above, I parsed the token stream "I=I(N);". The substance of the grammar I used inside the parse.y file, is collected below. It tries to resemble the ambiguous parts of VHDL:
start: prog
;
/* I cut out every semantic action to make this
text more readable */
prog: assignment ';'
| prog assignment ';'
;
assignment: 'I' '=' expression
;
expression: function_call %merge <stmtmerge2>
| array_indexing %merge <stmtmerge2>
| 'N'
;
function_call: 'I' '(' expression ')'
| 'I'
;
array_indexing: function_call '(' expression ')' %merge <stmtmerge>
| 'I' '(' expression ')' %merge <stmtmerge>
;
The whole sourcecode can be read at this github repository.
And now, let's get down to the actual Problem. As you can see in the generated parse tree above, the nodes FCALL1 and ARRIDX1 refer to the same single node EXPR1 which in turn refers to N1 twice.
This, by all means, should not have happened and I don't know why. Instead there should be the paths
FCALL1 -> EXPR2 -> N2
ARRIDX1 -> EXPR1 -> N1
Do you have any idea why bison reuses the aforementioned nodes?
I also wrote a bugreport on the official gnu mailing list for bison, without a reply to this point though. Unfortunately, due to the restictions for new stackoverflow users, I can't provide no link to this bug report...