yacc stop doing shift&& reduce once cannot get any more symbol from yylex()

Question

here is my code:

%{
#include<string.h> 
#include "y.tab.h"
#define DEBUG 0
void yyerror(char* s);
void debug(char* string) {
    if (DEBUG) {
        printf(string);
    }
}
%}  
selector   "selector"[0-9]+
positive   "+"
negtive    "-"
contains   "."
before     "->"
or         "||"
and        "&&"
delim      [ /n/t]  
ws         {delim}*
%%  
{selector} { debug("Lex:SELECTOR\n"); yylval.string = yytext; return     SELECTOR; }  
{positive} { debug("Lex:POSITIVE\n"); return POSITIVE; }  
{negtive}  { debug("Lex:NEGTIVE\n"); return NEGTIVE; }  
{contains} { debug("Lex:BELONGS\n"); return CONTAINS; } 
{before}   { debug("Lex:BEFORE\n"); return BEFORE; }  
{or}       { debug("Lex:OR\n"); return OR; }  
{and}      { debug("Lex:AND\n"); return AND; } 
{ws}       ;
.          { debug("Lex Parser Error"); exit(1); }    
%%

.y:
%{
#include <stdio.h>
#define YYDEBUG 0
int yyparse(void);
void yyerror(char* s);
%}

%union {
    char *string;
}

%token <string> SELECTOR 
%token POSITIVE
%token NEGTIVE
%left CONTAINS
%left BEFORE
%token OR
%token AND

%%
logical_expr : assertion { printf("[reduce] L->A\n"); } 
    | logical_expr AND assertion { printf("[reduce] L->L && A\n");}   
    | logical_expr OR assertion { printf("[reduce] L->L || A\n"); }        
;
assertion : POSITIVE prefix_relation { printf("[reduce] A->+P\n"); }
    | NEGTIVE prefix_relation { printf("[reduce] A->-P\n"); }
;
prefix_relation : prefix_relation BEFORE contain_relation { printf("[reduce] P->P->C\n"); }
    | contain_relation { printf("[reduce] P->C\n");;}
;
contain_relation : contain_relation CONTAINS SELECTOR { printf("[reduce] C->C.S[%s]\n", $3); }
    | SELECTOR { printf("[reduce] C->S[%s]\n", $1); }
;
%%
int main()
{
    return yyparse();
}
void yyerror(char* s)
{
    fprintf(stderr,"%s",s);
}
int yywrap()
{
    return 1;
}

my input string is: +selector1.selector2||-selector4->selector4

the parse tree of this input is expected to be: the expected parse tree

my program generated by yacc gives output as below:

[reduce] C->S[selector1]     // stack: +C
[reduce] C->C.S[selector2]   // stack: +C
[reduce] P->C                // stack: +P
[reduce] A->+P               // stack: A
[reduce] L->A                // stack: L
[reduce] C->S[selector3]     // stack: L||-C
[reduce] P->C                // stack: L||-P
[reduce] C->S[selector4]     // stack: L||-P->C

it seems that the program stop doing shift&& reduce once cannot get any more symbol from yylex(), but i expect it to reduce the remained symbols in stack, thus L||-P->C, so that i can generate the whole parse tree in my code.

my expected output is:

[reduce] C->S[selector1]     // stack: +C
[reduce] C->C.S[selector2]   // stack: +C
[reduce] P->C                // stack: +P
[reduce] A->+P               // stack: A
[reduce] L->A                // stack: L
[reduce] C->S[selector3]     // stack: L||-C
[reduce] P->C                // stack: L||-P
[reduce] C->S[selector4]     // stack: L||-P->C
[reduce] P->P->C             // stack: L||-P
[reduce] A->-P               // stack: L||A
[reduce] L->L||A             // stack: L

You haven't shown your entire lex file. In fact, you haven't shown any rule at all; only the definitions. Undoubtedly the problem has to do with your EOF handling (and most likely that you've chosen to handle it in some way rather than let flex handle it automatically) or some issue with newline handling (such as your doing something other than ignoring them) but without more information it's impossible to say. — rici, Jan 17 '15 at 17:25
Your `delim` rule has forward slashes where it should have backslashes. — user207421, Jan 18 '15 at 10:19

rici · Accepted Answer · 2015-01-18T20:40:32.753

There are a number of issues with your scanner (flex) definition.

Your default flex rule just calls exit without any error message, (unless DEBUG is defined and non-zero) so any lexical error will cause the program to silently halt. It would be much better to call yyerror in that case, and produce a visible error message.
As EJP points out in a comment, your delim definition uses /n and /t instead of \n and \t, so it will match neither a newline nor a tab. A newline won't be matched by your default rule either, so it will fall through to the flex-generated default rule, which simply prints the unmatched character to stdout. (It is better to include %option nodefault, which will cause flex to produce an error message if some input matches none of your rules.)
Your selector rule sets yylval.string = yytext. You cannot do that because yytext points into the scanner's internal storage and the string it points to will be modified the next time yylex is called. If you want to pass the matched token from the scanner to the parser, you must make a copy of the token, and then you need to ensure that you free the storage allocated for it when you no longer require the string, in order to avoid leaking memory.
You need to be aware that parsers generated by bison or yacc generally need to read a lookahead token before performing reductions. Consequently, the last series of reductions you expect will not be performed until the scanner returns the END token, which it will only do when it reads an end-of-file. So if you are testing your parser interactively, you will not see the final reductions until you signal an end-of-file by typing ctrl D (on Unix).

As a final note, both flex and bison are capable of generating debugging output which indicates which rules are matched (flex) and the series of shifts, reduces and other parser actions (bison). It's simpler and more reliable to use these features rather than trying to implement your own debugging output.

i've changed `ws` to `ws {delim}+`, but the program still stop reducing after output `[reduce] C->S[selector4]`, and if i input `.selector5` now, it does output `[reduce] C->C.S[selector5]` — yeren1989, Jan 18 '15 at 07:41
point 1 is incorrect -- lex/flex will NEVER match an empty string, as doing so would cause an infinite loop (as you note). The 'longest match' rule ensures that something else will ALWAYS match instead -- either the single token default rule or the default EOF rule if you don't define something else yourself that matches. So the patterns `{ws}+` and `{ws}*` are exactly equivalent in effect. — Chris Dodd, Jan 18 '15 at 20:05
Note that there are still ways you can get an infinite loop -- by using the `/` trailing context operator, or `yyless` -- but in these cases flex is matching a longer (non-empty) token, and then unreading characters to be reread. — Chris Dodd, Jan 18 '15 at 20:08
@ChrisDodd: Ok, I tried it and you're right. I'm sure I've hit that problem but I can't remember the exact circumstances. — rici, Jan 18 '15 at 20:39
I was not aware of yacc's `lookahead` behavior, now the problem is solved, thank you. — yeren1989, Jan 19 '15 at 03:20

yacc stop doing shift&& reduce once cannot get any more symbol from yylex()

1 Answers1