Nested Shift/reduce conflict in bison?

Question

I am new to this and I am trying to understand what is going on here, where I get two shift reduce conflicts. I have grammar (If I am missing something I can add the needed rules):

%start translation_unit

%token EOF      0               "end of file"

%token                      LBRA        "["
%token                      RBRA        "]"
%token                      LPAR        "("
%token                      RPAR        ")"
%token                      DOT         "."
%token                      ARROW       "->"
%token                      INCDEC      "++ or --"
%token                      COMMA       ","
%token                      AMP         "&"
%token                      STAR        "*"
%token                      ADDOP       "+ or -"
%token                      EMPH        "!"
%token                      DIVOP       "/ or %"
%token                      CMPO        "<, >, <=, or >="
%token                      CMPE        "== or !="
%token                      DAMP        "&&"
%token                      DVERT       "||"
%token                      ASGN        "="
%token                      CASS        "*=, /=, %=, +=, or -="
%token                      SEMIC       ";"
%token                      LCUR        "{"
%token                      RCUR        "}"
%token                      COLON       ":"

%token                      TYPEDEF     "typedef"
%token                      VOID        "void"
%token                      ETYPE       "_Bool, char, or int"
%token                      STRUCT      "struct"
%token                      ENUM        "enum"
%token                      CONST       "const"
%token                      IF          "if"
%token                      ELSE        "else"
%token                      DO          "do"
%token                      WHILE       "while"
%token                      FOR         "for"
%token                      GOTO        "goto"
%token                      CONTINUE    "continue"
%token                      BREAK       "break"
%token                      RETURN      "return"
%token                      SIZEOF      "sizeof"

%token<string>              IDF         "identifier"
%token<string>              TYPEIDF     "type identifier"
%token<int>                 INTLIT      "integer literal"
%token<string>              STRLIT      "string literal"

/////////////////////////////////

%%

/////////////////////////////////

primary_expression:
    IDF
    | INTLIT
    | STRLIT
    | LPAR expression RPAR
    ;

postfix_expression:
    primary_expression
    | postfix_expression LBRA expression RBRA
    | postfix_expression LPAR argument_expression_list_opt RPAR
    | postfix_expression DOT IDF
    | postfix_expression ARROW IDF
    | postfix_expression INCDEC
    ;

argument_expression_list_opt:
    %empty
    | argument_expression_list
    ;

argument_expression_list:
    assignment_expression
    | argument_expression_list COMMA assignment_expression
    ;

unary_expression:
    postfix_expression
    | INCDEC unary_expression
    | unary_operator cast_expression
    | SIZEOF LPAR type_name RPAR
    ;

unary_operator:
    AMP
    | STAR
    | ADDOP
    | EMPH
    ;

cast_expression:
    unary_expression
    ;

multiplicative_expression:
    cast_expression
    | multiplicative_expression STAR cast_expression
    | multiplicative_expression DIVOP cast_expression
    ;

additive_expression:
    multiplicative_expression
    | additive_expression ADDOP multiplicative_expression
    ;

relational_expression:
    additive_expression
    | relational_expression CMPO additive_expression
    ;

equality_expression:
    relational_expression
    | equality_expression CMPE relational_expression
    ;

logical_AND_expression:
    equality_expression
    | logical_AND_expression DAMP equality_expression
    ;

logical_OR_expression:
    logical_AND_expression
    | logical_OR_expression DVERT logical_AND_expression
    ;

assignment_expression:
    logical_OR_expression
    | unary_expression assignment_operator assignment_expression
    ;

assignment_operator:
    ASGN 
    | CASS
    ;

expression:
    assignment_expression
    ;

constant_expression:
    logical_OR_expression
    ;

declaration:
    no_leading_attribute_declaration
    ;

no_leading_attribute_declaration:
    declaration_specifiers init_declarator_list_opt SEMIC
    ;

init_declarator_list_opt:
    %empty
    | init_declarator_list
    ;

declaration_specifiers:
    declaration_specifier
    | declaration_specifier declaration_specifiers
    ;

declaration_specifier:
    storage_class_specifier
    | type_specifier_qualifier
    ;

init_declarator_list:
    init_declarator
    | init_declarator_list COMMA init_declarator
    ;

init_declarator:
    declarator
    ;

storage_class_specifier:
    TYPEDEF
    ;

type_specifier:
    VOID
    | ETYPE
    | struct_or_union_specifier
    | enum_specifier
    | typedef_name
    ;

struct_or_union_specifier:
    struct_or_union IDF LCUR member_declaration_list RCUR
    | struct_or_union IDF
    ;

struct_or_union:
    STRUCT
    ;

member_declaration_list:
    member_declaration
    | member_declaration_list member_declaration
    ;

member_declaration:
    specifier_qualifier_list member_declarator_list_opt SEMIC
    ;

member_declarator_list_opt:
    %empty
    | member_declarator_list
    ;

specifier_qualifier_list:
    type_specifier_qualifier
    | type_specifier_qualifier specifier_qualifier_list
    ;

type_specifier_qualifier:
    type_specifier
    | type_qualifier
    ;

member_declarator_list:
    member_declarator
    | member_declarator_list COMMA member_declarator
    ;

member_declarator:
    declarator
    ;

enum_specifier:
    ENUM IDF LCUR enumerator_list RCUR
    | ENUM IDF LCUR enumerator_list COMMA RCUR
    | ENUM IDF
    ;

enumerator_list:
    enumerator
    | enumerator_list COMMA enumerator
    ;

enumerator:
    ENUM
    | ENUM ASGN constant_expression
    ;

type_qualifier:
    CONST
    ;

pointer:
    STAR type_qualifier_list_opt
    | STAR type_qualifier_list_opt pointer
    ;

pointer_opt:
    %empty
    | pointer
    ;

declarator:
    pointer_opt direct_declarator
    ;

direct_declarator:
    IDF
    | LPAR declarator RPAR
    | array_declarator
    | function_declarator
    ;

abstract_declarator:
    pointer
    | pointer_opt direct_abstract_declarator
    ;

direct_abstract_declarator:
    LPAR abstract_declarator RPAR
    | array_abstract_declarator
    | function_abstract_declarator
    ;

direct_abstract_declarator_opt:
    %empty
    | direct_abstract_declarator
    ;

array_abstract_declarator:
    direct_abstract_declarator_opt LBRA assignment_expression RBRA
    ;

function_abstract_declarator:
    direct_abstract_declarator_opt LPAR parameter_type_list RPAR
    ;

array_declarator:
    direct_declarator LBRA assignment_expression RBRA
    ;

function_declarator:
    direct_declarator LPAR parameter_type_list RPAR
    ;

type_qualifier_list_opt:
    %empty
    | type_qualifier_list
    ;

type_qualifier_list:
    type_qualifier
    | type_qualifier_list type_qualifier
    ;

parameter_type_list:
    parameter_list
    ;

parameter_list:
    parameter_declaration
    | parameter_list COMMA parameter_declaration
    ;

parameter_declaration:
    declaration_specifiers declarator
    | declaration_specifiers abstract_declarator_opt
    ;
abstract_declarator_opt:
    %empty
    | abstract_declarator
    ;

type_name:
    specifier_qualifier_list abstract_declarator_opt
    ;

typedef_name:
    TYPEIDF
    ;

statement:
    matched
    | unmatched
    | other
    ;

other:
    expression_statement
    | compound_statement
    | jump_statement
    ;

matched:
    selection_statement_c
    | iteration_statement_c
    ;

unmatched:
    selection_statement_o
    | iteration_statement_o
    ;

compound_statement:
    LCUR block_item_list_opt RCUR
    ;

block_item_list_opt:
    %empty
    | block_item_list
    ;

block_item_list:
    block_item
    | block_item_list block_item
    ;

block_item:
    declaration
    | statement
    ;

expression_statement:
    expression_opt SEMIC
    ;

expression_opt:
    %empty
    | expression
    ;

selection_statement_o:
    IF LPAR expression RPAR statement
    | IF LPAR expression RPAR matched ELSE unmatched
    ;

selection_statement_c:
    IF LPAR expression RPAR matched ELSE matched
    ;

iteration_statement_o:
    WHILE LPAR expression RPAR unmatched
    | FOR LPAR expression_opt SEMIC expression_opt SEMIC expression_opt RPAR unmatched
    ;

iteration_statement_c:
    WHILE LPAR expression RPAR matched
    | DO statement WHILE LPAR expression RPAR SEMIC
    | FOR LPAR expression_opt SEMIC expression_opt SEMIC expression_opt RPAR matched
    ;

jump_statement:
    RETURN expression_opt SEMIC
    ;

translation_unit:
    external_declaration
    | translation_unit external_declaration
    ;

external_declaration:
    function_definition
    | declaration
    ;

function_definition:
    declaration_specifiers declarator compound_statement
    ;

/////////////////////////////////

%%

And I am getting this shift/reduce conflicts:

State 156

   85 declarator: pointer_opt . direct_declarator
   91 abstract_declarator: pointer_opt . direct_abstract_declarator

    "("           shift, and go to state 187
    "identifier"  shift, and go to state 44

    "("       [reduce using rule 111 (direct_abstract_declarator_opt)]
    ^^conflict with previous "(" ^^
    $default  reduce using rule 111 (direct_abstract_declarator_opt)

...

State 197

   91 abstract_declarator: pointer_opt . direct_abstract_declarator

    "("  shift, and go to state 213
    "("       [reduce using rule 111 (direct_abstract_declarator_opt)]
    ^^conflict^^

    $default  reduce using rule 111 (direct_abstract_declarator_opt)

So I am understanding this as : When I get "(" I can do two thigs. First, from direct_declarator I get LPAR declarator RPAR where LPAR is "("
... or ...
I can reduce direct_abstract_declarator->array_abstract_declarator->direct_abstract_declarator_opt->direct_abstract_declarator->LPAR abstract_declarator RPAR where LPAR is "(" again. So I can't decide which way to go right ?
How should I tackle this conflict ? Or am I seeing it completely wrong ?

An LR parser makes it's decision at the *end* of a production, not the beginning. That's why it is more powerful than a top down parser. A shift-reduce conflict happens when the parser doesn't know if it is at the end of a production, because some other active production could continue. — rici, Nov 12 '21 at 16:38
So the problem is that both "direct_declarator" and "direct_abstract_declarator" can make "(" ? @rici — Riomare, Nov 12 '21 at 16:52
You don't include a grammar which can be passed through bison. You don't provide the complete text of the state diagrams. What makes you think that anyone can provide a reasonable answer to your question? Please read the help for creating a [mre]; if your response is "my program is too big to include here", then *read it again* because that is not what it is requesting. — rici, Nov 12 '21 at 17:50
Sorry, I thought I included all necessary rules. I updated the code. (there are still some non-terminals which are not described but I think they are not important, as well as the state diagrams) If state diagrams are important here I am totally lost and I am very sorry for taking your time. @rici — Riomare, Nov 12 '21 at 18:04
If you include a reproducible grammar, I can run bison on it and produce the entire state table. The key here is *reproducible*. Reproducible means that I can just feed it to bison without editing it. That means that you don't leave out stuff just because you think its unimportant. Bison needs all tokens to be declared (although you can make things a lot simpler by using quoted tokens, which don't need to be declared). Sorry to be grumpy, but I've recently realised that although I *always* ask for a [mre] and the SO help clearly states what that is, **no-one every complies**. No-one. — rici, Nov 12 '21 at 18:10
And I usually end up guessing what was left out and adding it back in order to be able to run bison, which is truly a waste of time. Please focus on two key concepts in [mre]: *minimal* and *complete*. It's really not hard to do. For example, if a non-terminal isn't contributing to the problem, just make it a token (unless it's nullable, in which case you need to add a optional rule). — rici, Nov 12 '21 at 18:11
Not only will that help us, it will help you. In the process of trying to isolate the problem, you may well end up solving it yourself. So that's a great way of learning how things work. — rici, Nov 12 '21 at 18:14
But it uses some other files and those files use other files... it is complicated. — Riomare, Nov 12 '21 at 18:19
We're just talking about the grammar here. The grammar doesn't depend on anything. At least for a parsing conflict, I don't need to run the parser. I just need to be able to pass the grammar through bison. And that's the same for you. You could solve the conflict without running the parser. Running the parser doesn't help. Building the parser is all that is needed. *But I need to be able to do that*. And I can't do that unless you give a complete self-contained grammar file. — rici, Nov 12 '21 at 18:23
When I'm writing a grammar for a new parser, I start exactly in this way, with just a grammar. No semantic actions, no declarations, nothing. And then I add a minimal lexer. When I start writing the semantics, I already know that the grammar parses the language I want to parse. — rici, Nov 12 '21 at 18:25
By the way, the point of a declaration like: `%token ENUM "enum"` is to be able to write productions like `enum_specifier: "enum" IDF "{" enumerator_list "}"`. That's a lot easier to read (at least for me), and it has the additional advantage that it will work without a `%token` declarations at all, except for `IDF`. (I personally only use quoted tokens for keywords, precisely as a clue to the reader.) So if you're trying for minimal, that lets you get rid of a lot of `%token` declarations and also make your grammar easier to read. — rici, Nov 12 '21 at 19:10

rici · Accepted Answer · 2021-11-12T19:13:43.993

One of the easiest ways to create a shift-reduce conflict is to introduce a new non-terminal representing an optional instance:

something_opt
    : something
    | %empty

Sometimes that will work, but more commonly it will turn out that there is some context in which something might or might not be optional. In other words, you have two different productions which use something:

a: something something_else "END_A"
b: something_opt something_else "END_B"

That's not ambiguous, but it cannot be parsed with one token lookahead, because after something is recognised, the parser has to decide whether to reduce it to something_opt, which would be necessary if the following text turns out to be a b. something_opt is basically pointless semantically; there is a something or there is the absence of a something. And if it were not for that detail, the parser would have no problem; it could continue to parse something_else and then, depending on whether it encounters END_A or END_B, it can decide whether to reduce an a or a b.

The fix is simple: instead of defining something_opt, simply duplicate the production so that there is one production with something and the other without.

In your case, the offending optional non-terminal is direct_abstract_declarator_opt:

direct_abstract_declarator_opt:
    %empty
    | direct_abstract_declarator
 
array_abstract_declarator:
    direct_abstract_declarator_opt LBRA assignment_expression RBRA

function_abstract_declarator:
    direct_abstract_declarator_opt LPAR parameter_type_list RPAR

Those are the only places where it is used, so we can just get rid of it by duplicating the productions for the other two non-terminals:

array_abstract_declarator:
    direct_abstract_declarator LBRA assignment_expression RBRA
    | LBRA assignment_expression RBRA

function_abstract_declarator:
    direct_abstract_declarator LPAR parameter_type_list RPAR
    | LPAR parameter_type_list RPAR

If at all possible, you should use this model to write grammars with optional non-terminals. It does require duplicating semantic actions, but hopefully the semantic action is just a simple call to an AST node builder. And it will save you a lot of unnecessary grief.

I am extremely grateful for your time and will to help me. Thanks for your explanation and answers. And I am so happy that after I successfuly corrected and finished my grammar, I come here and see that it is same as your solution. :) Thank you, again, very much for your time and advice. — Riomare, Nov 12 '21 at 22:26

Nested Shift/reduce conflict in bison?

1 Answers1