0

I'm working on the clike language compiler using Jison package. I went really well until I've introduced classes, thus Type can be a LITERAL now. Here is a simplified grammar:

%lex

%%
\s+                       /* skip whitespace */

int                       return 'INTEGER'
string                    return 'STRING'
boolean                   return 'BOOLEAN'
void                      return 'VOID'
[0-9]+                    return 'NUMBER'
[a-zA-Z_][0-9a-zA-Z_]*    return 'LITERAL'
"--"                      return 'DECR'
<<EOF>>                   return 'EOF'
"="                       return '='
";"                       return ';'


/lex

%%

Program
  : EOF
  | Stmt EOF
  ;

Stmt
  : Type Ident ';'
  | Ident '=' NUMBER ';'
  ;

Type
  : INTEGER
  | STRING
  | BOOLEAN
  | LITERAL
  | VOID
  ;

Ident
  : LITERAL
  ;

And the jison conflict:

Conflict in grammar: multiple actions possible when lookahead token is LITERAL in state 10
- reduce by rule: Ident -> LITERAL
- reduce by rule: Type -> LITERAL
Conflict in grammar: multiple actions possible when lookahead token is = in state 10
- reduce by rule: Ident -> LITERAL
- reduce by rule: Type -> LITERAL

States with conflicts:
State 10
  Type -> LITERAL . #lookaheads= LITERAL =
  Ident -> LITERAL . #lookaheads= LITERAL =

I've found quite a similar question that has no been answered, does any one have any clue how to solve this?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Vees
  • 703
  • 4
  • 9

1 Answers1

1

That's evidently a bug in jison, since the grammar is certainly LALR(1), and is handled without problems by bison. Apparently, jison is incorrectly computing the lookahead for the state in which the conflict occurs. (Update: It seems to be bug 205, reported in January 2014.)

If you ask jison to produce an LR(1) parser instead of an LALR(1) grammar, then it correctly computes the lookaheads and the grammar passes without warnings. However, I don't think that is a sustainable solution.

Here's another work-around. The Decl and Assign productions are not necessary; the "fix" was to remove LITERAL from Type and add a separate production for it.

Program
  : EOF
  | Stmt EOF
  ;

Decl
  : Type Ident ';'
  | LITERAL Ident ';'
  ;

Assign
  : Ident '=' NUMBER ';'
  ;

Stmt
  : Decl
  | Assign
  ;

Type
  : INTEGER
  | STRING
  | BOOLEAN
  | VOID
  ;

Ident
  : LITERAL
  ;

You might want to consider recognizing more than one statement:

Program
  : EOF
  | Stmts EOF
  ;

Stmts
  : Stmt
  | Stmts Stmt
  ;
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the solution, I'm pretty glad that it turned out to be a bug, because I started to doubt if I know what LALR(1) is. However at the end i resolved this issue in the even simpler way - just by removing built-in types from grammar and adding them during to semantic analysis. – Vees Dec 19 '15 at 14:17