0

I'm writing a parser for a project and got stuck on an issue. Here's a self contained example of the problem:

%error-verbose

%token ID
%token VAR
%token END_VAR
%token CONSTANT
%token AT
%token VALUE
%%

unit: regular_var_decl
    | direct_var_decl;

regular_var_decl: VAR constant_opt ID ':' VALUE ';' END_VAR;

constant_opt: /* empty */ | CONSTANT;

direct_var_decl: VAR ID AT VALUE ':' VALUE ';' END_VAR;

%%
#include <stdlib.h>
#include <stdio.h>

yylex() {
  static int i = 0;

  static int tokens[] = {
    VAR,
      ID, ':', VALUE, ';',
    END_VAR,
    0
  };

  return tokens[i++];
};

yyerror(str) char *str; {
  printf("fail: %s\n", str);
};

main() {
  yyparse();
  return 0;
};

One could build it bison test.y && cc test.tab.c && ./a.out.

It warns me that constant_opt is useless due to conflicts.

This ambiguity could be solved by using LALR(2), since after ID it could find ':' or AT... How could I solve this issue on bison?

paulotorrens
  • 2,286
  • 20
  • 30

2 Answers2

1

A simple solution is to just not abbreviate the optional CONSTANT:

regular_var_decl:  VAR ID ':' VALUE ';' END_VAR;
constant_var_decl: VAR CONSTANT ID ':' VALUE ';' END_VAR;
direct_var_decl:   VAR ID AT VALUE ':' VALUE ';' END_VAR;

That allows the reduction decision to be deferred until enough information is known. (You could factor ':' VALUE ';' END_VAR into a non-terminal if that were useful.)

Another possibility is leave the grammar as it was, and ask bison to produce a GLR parser (%glr-parser). The GLR parser will effectively retain two (or more) parallel parses until the ambiguity can be resolved, which should certainly fix the constant_opt problem. (Note that the shift/reduce conflicts are still reported by bison; it cannot tell whether the language is actually ambiguous until an ambiguous sentence is discovered at runtime.) Much of the time, no additional change to the grammar needs to be made, but it does slow the parse down a little bit.

A final possibility, probably less useful here, is to accept a superset of the language and then issue an error message in an action:

var_decl: VAR const_opt ID at_value_opt ':' VALUE ';' END_VAR {
   if (/* pseudocode */ $2 && $4) { /* flag a syntax error */ }
}

That depends on the two opt terminals returning a semantic value which can be interrogated somehow for empty.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Though this might work, I would give me a lot of trouble. I've written the syntax from a pretty big EBNF spec, and I have **a lot** of `blabla_opt` in it. I've spent the last five days "translating" it, and I believe it would be easier to do it again with PEGs instead of fixing this issues this way. :( – paulotorrens Jun 04 '15 at 13:28
  • 1
    @PauloTorrens: OK, added the GLR-parser option. Make sure you have a recent bison version. – rici Jun 04 '15 at 13:45
  • I'll take the GLR approach; looks like it is the one that will reduce my trouble the most... though I still have about 1200 lines to fix. Thank you. :D ("Reduce", got it?) – paulotorrens Jun 04 '15 at 13:46
  • 1
    @PauloTorrens: The main annoyance with GLR parsers is that bison continue to report all the conflicts, and you have no idea which ones matter. It will produce a runtime error if it finds an ambiguous input. Good luck. – rici Jun 04 '15 at 13:48
1

Another solution is to factor it further:

var_decl: VAR constant_opt ID direct_opt ':' VALUE ';' END_VAR;

constant_opt: /* empty */ | CONSTANT;

direct_opt: /* empty */ | AT VALUE;

Then in your action for var_decl, you decide if it's regular, constant, or direct, or issue an error if it has both CONSTANT and AT VALUE. This has the advantage that you can give a custom, clear error message for the latter case, rather than just a generic "syntax error" message.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • Sadly, as I said above, I got the syntax (around 1200 lines) from an EBNF standard. This approach would require too much work with a language I'm not totally familiar with yet. – paulotorrens Jun 04 '15 at 14:48
  • I agree. That's the third suggestion in my answer :) – rici Jun 04 '15 at 15:36