0

I am writing a parser to a C-like grammar, but I am having a problem with a shift/reduce conflict:

Basically, the grammar accept a list of optional global variables declarations followed by the functions.

I have the following rules:

program: global_list function_list;

type_name : TKINT /* int */
          | TKFLOAT /* float */
          | TKCHAR /* char */

global_list : global_list var_decl ';'
            |
            ;

var_decl : type_name NAME;

function_list : function_list function_def
              |
              ;

function_def : type_name NAME '(' param_list ')' '{' func_body '}' ;

I understand that I have a problem because the grammar can't decide if the next type_name NAME belongs to global_list or function_list, and by default it is expecting a global_list

Ex:

int var1;

int foo(){}

error: unexpcted '(', expecting ';'
Jorgel
  • 920
  • 3
  • 14
  • 28
  • How important to you is that variables all be declared before functions? Because if you just have a list of declarations, there would be no problem. – rici Feb 23 '15 at 02:14
  • @rici the global variable declaration necessarily must come before the functions – Jorgel Feb 23 '15 at 02:17

1 Answers1

2

The problem is that a function_def can only occur after a function_list, which means that the parser needs to reduce an empty function_list (using the production function_list → ε) before it can recognize a function_def. Furthermore, it needs to make that decision by only looking at the token which follows the empty production. Since that token (a type_name) could start either a var_decl or a function_def, there is no way for the parser to decide.

Even leaving the decision for one more token won't help; it's not until the third token that the correct decision can be made. So your grammar is not ambiguous, but it is LR(3).

Sequences of possibly empty lists of different type always create this problem. By contrast, sequences of non-empty lists do not, so a first approach to solving the problem is to eliminate the ε-productions.

First, we expand the top-level definition to make it clear that both lists are optional:

program: global_list function_list;
       | global_list
       | function_list
       |
       ;

Then we make both list types non-empty:

global_list
       : var_decl
       | global_list var_decl
       ;

function_list
       : function_def
       | function_list function_def
       ;

The rest of the grammar is unchanged.

type_name : TKINT /* int */
          | TKFLOAT /* float */
          | TKCHAR /* char */

var_decl : type_name NAME;

function_def : type_name NAME '(' param_list ')' '{' func_body '}' ;

It's worth noting that the problem would never have arisen if declarations could be interspersed. Is it really necessary that all global variables be defined before any function? If not, you could just use a single list type, which would also be conflict free:

program: decl_list ;

decl_list:
         | decl_list var_decl;
         | decl_list function_def
         ;

Both these solutions work because a bottom-up parser can wait until the end of the production being reduced in order to decide which is the correct reduction; it does not matter that var_decl and function_def look identical until the third token.

The problem really is that it's hard to figure out the type of nothing.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thank you for the explanation. The changes solved the problem – Jorgel Feb 23 '15 at 02:37
  • The last solution (free order decl) is in fact also a solution to the problem: it can parsers the intended C-like language, and has some local extensions :) – JJoao Feb 24 '15 at 08:43
  • @rici and JOrge, what would be the behavior of the original version (with the 3 shift-reduce conflits)? – JJoao Feb 24 '15 at 08:46