I am working on a project to add support for Cypher clauses on Postgres psql. I am trying to improve the parser performance, resolving conflicts between the rules.I have created a minimal example to illustrate a frequent issue in the implementation.This example is below the description.
A clause consists of commands mixed with options. The options are commands that may or may not be in the clause. In the example below, when executing the program, we can trigger the rule COMMAND id_opt B str_opt
running the clause COMMAND country A "Canada"
. Similarly, we can trigger the rule COMMAND num_opt ab_opt str_opt
running the clause COMMAND 1 A "Canada"
or COMMAND 1 B "Canada"
. The first clause returns a syntax error because of the conflict.
The problem is since id_opt
, str_opt
, and num_opt
are options and can be empty, the clause COMMAND A
can trigger both rules, resulting in a conflict and returning the following warning when compiling the project:
gram.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Creating a unique rule with all options, as in the following example, solves the warning. But I didn't want the id_opt
in the same clause as num_opt
. In this fictitious language, the clause COMMAND 1 name A "Canada"
does not exist. Also, id_opt
only goes with A
. Given this scenario, should I merge all options and handle invalid options later, or should I keep the conflict and avoid invalid option combinations?
command:
COMMAND num_opt id_opt ab_opt str_opt { printf("Clause parsed successfully.\n"); }
;
For a more specific example, I am working on the cypher.y file from AgeSQL repository. This problem occurs in the return_clause
rule. The motive that I am showing a minimal example of is the cypher.y file rules have almost a thousand lines. The minimal example follows below:
gram.l file:
%{
#include "gram.tab.h"
%}
%%
[ \t\n] /* ignore whitespace */
"COMMAND" { return COMMAND; }
"A" { return A; }
"B" { return B; }
[0-9]+ { return NUMBER; }
[a-zA-Z][a-zA-Z0-9_.*]* { return IDENTIFIER; }
("\"")[^"]*("\"")|("\'")[^']*("\'") { return STRING; }
%%
int yywrap(void) {
return 1;
}
gram.y file:
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(const char*);
char u;
%}
%token COMMAND A B IDENTIFIER STRING NUMBER
%%
command:
COMMAND id_opt A str_opt { printf("Clause A parsed successfully.\n"); }
| COMMAND num_opt ab_opt str_opt { printf("Clause B parsed successfully.\n"); }
;
id_opt:
/* empty */
| IDENTIFIER;
;
str_opt:
/* empty */
| STRING
;
num_opt:
/* empty */
| NUMBER
;
ab_opt:
A
| B
;
%%
void yyerror(const char *s) {
fprintf(stderr, "Parse error: %s\n", s);
exit(1);
}
int main(void) {
yyparse();
printf("Parsed variable: %c\n", u);
return 0;
}
Makefile:
gram: gram.tab.c lex.yy.c
gcc -o gram gram.tab.c lex.yy.c
gram.tab.c: gram.y
bison -d gram.y
lex.yy.c: gram.l
flex gram.l