15

I'm using GNU Bison 2.4.2 to write a grammar for a new language I'm working on and I have a question. When I specify a rule, let's say:

statement : T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}' {
           // create a node for the statement ...
}

If I have a variation on the rule, for instance

statement : T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}' {
           // create a node for the statement ...
}

Where (from flex scanner rules) :

"class"                     return T_CLASS;
"extends"                   return T_EXTENDS;
[a-zA-Z\_][a-zA-Z0-9\_]*    return T_IDENT;

(and T_IDENT_LIST is a rule for comma separated identifiers).

Is there any way to specify all of this only in one rule, setting somehow the "T_EXTENDS T_IDENT_LIST" as optional? I've already tried with

 T_CLASS T_IDENT (T_EXTENDS T_IDENT_LIST)? '{' T_CLASS_MEMBERS '}' {
     // create a node for the statement ...
 } 

But Bison gave me an error.

Thanks

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Simone Margaritelli
  • 4,584
  • 10
  • 45
  • 70

3 Answers3

15

To make a long story short, no. Bison only deals with LALR(1) grammars, which means it only uses one symbol of lookahead. What you need is something like this:

statement: T_CLASS T_IDENT extension_list '{' ...

extension_list: 
              | T_EXTENDS T_IDENT_LIST
              ;

There are other parser generators that work with more general grammars though. If memory serves, some of them support optional elements relatively directly like you're asking for.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • That was the solution to write only one rule without the | :) Thanks! – Simone Margaritelli Apr 19 '10 at 17:47
  • 1
    It has nothing to do with it being LALR(1), since both are LALR(1). Its because the input syntax is BNF not EBNF. – Chris Dodd May 06 '13 at 14:21
  • 2
    @ChrisDodd: Sorry, but wrong. The problem here is that as he wrote it, his parser would have to look ahead three symbols, across T_CLASS and T_IDENT to see whether the next symbol was a `{` or T_EXTENDS to see which `statement` variation to use. That's violating LALR(1). EBNF looks like a complete red-herring to me -- I see nothing that even resembles EBNF anywhere in the question. – Jerry Coffin May 07 '13 at 03:22
1

Why don't you just split them using the choice (|) operator?

statement:
  T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}'
  | T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}'

I don't think you can do it just because this is a LALR(1) bottom-up parser, you would need something different like a LL(k) (ANTLR?) to do what you want to do..

Jack
  • 131,802
  • 30
  • 241
  • 343
0

I think the most you can do is

statement : T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}'
    | T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}' {
}
Michael Krelin - hacker
  • 138,757
  • 24
  • 193
  • 173