5

We have no idea on how to track errors in yacc parser. We're trying to use yylineno in our lex file and tried adding %option yylineno but it's still not workin', we cannot access these variables in yacc.

All we want is to print out the syntax error using the error in yacc together with the line number.

here's our .l file

%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
int yylineno=1;

%}

%option yylineno

identifier  [a-zA-Z_][a-zA-Z0-9_]*
int_constant    [0-9]+
delimiter       ;

%%

"int"       {return INT;}
{int_constant}  return INT_CONST;
{identifier}    return IDENT;
\=      {return ASOP;}
\+      {return PLUS;}
\-      {return MINUS;}
\*      {return MULT;}
\/      {return DIV;}
\,      {return COMMA;}
\(      {return OP;} /*OP CP = Opening Closing Parenthesis*/
\)      {return CP;}
\[      {return OB;} /*OB CB = Opening Closing Brace*/
\]      {return CB;}
\{      {return OCB;} /*OCB CCB = Opening Closing Curly Brace*/
\}      {return CCB;}
{delimiter} return DEL;
[ \t]
[\n]        {yylineno++;}


%%

now here's our .y file

%{
#include <stdio.h>
#include <string.h>
#include "y.tab.h"

extern FILE *yyin;

%}

%token INT INT_CONST IDENT ASOP PLUS MINUS MULT DIV DEL COMMA CP CB CCB
%left OP OB OCB


%%

program:        program_unit;
program_unit:   program_unit component | component
component:  var_decl DEL | func_decl DEL | func_defn ;
var_decl:       dt list;
dt:     INT;
list:       list COMMA var | var 
        | error {printf("before ';' token\n"); yyerrok;}
        | error INT_CONST {printf("before numeric constant\n"); yyerrok;};
var:        IDENT
        |IDENT init;
init:       ASOP IDENT init | ASOP expr | ASOP IDENT ;
expr:       IDENT op expr | const op expr | const | OP expr CP;
const:      INT_CONST;
op:     PLUS | MINUS | MULT | DIV;
func_decl:  dt mult_func;
mult_func:  mult_func COMMA mfunc | sfunc;
mfunc:      IDENT OP CP;
sfunc:      IDENT OP CP OCB func_body CCB;
func_body:  program_unit;

func_defn:  dt IDENT OP CP OCB func_body CCB
        | IDENT OP CP OCB func_body CCB; 

%%

int yyerror(char *s){
    extern int yylineno;
    fprintf(stderr,"At line %d %s ",s,yylineno);  
}

int yywrap(){
    return 1;
}

int main(int argc, char *argv[]){
    yyin=fopen("test.c","r");
    yyparse();
    fclose(yyin);
    return 0;
}
rici
  • 234,347
  • 28
  • 237
  • 341
Aron
  • 129
  • 1
  • 3
  • 13
  • 1
    Please show us what you have tried. Otherwise we can't tell you what you have done wrong. – Stephen C Oct 06 '14 at 07:41
  • @Stephen C. Mister, our `.l` and `.y` files are now together with the recent post. You can also give us examples, evan the simplest one. Thank you very much. – Aron Oct 06 '14 at 08:09

1 Answers1

6

There are a number of problems with those files, but none of them will prevent yylineno from being available to your bison-generated parser.

There will be a compile-time warning resulting from your definition of yyerror. Or possibly several warnings.

First, the correct signature is:

void yyerror(const char *msg);

It's ok to return an int but the value is never used; however, your definition of the function just falls off the end, so the compiler will complain about the fact that no value is returned. Also, yyerror is normally called with a literal string argument, which is immutable; standard C allows literal strings to be passed to a function whose parameter type is non-const, but the usage is not recommended and a compiler might warn. More importantly,

fprintf(stderr,"At line %d %s ",s,yylineno);

applies the %d (integer) format to s (a string) and the %s (string) format to yylineno (an integer); again, this should produce a compile-time warning and if you ignore the error, your program will probably segfault.

Finally (relevant to yylineno), if you specify %option yylineno in your flex input (which is a good idea if you want to count line numbers) then the flex-generated scanner will define and initialize yylineno and do the counting for you. So your definition of yylineno in your .l file will trigger a compile-time error (redefinition of yylineno). Also when you increment yylineno ([\n] {++yylineno;}) explicitly, you end up double-counting lines; yylineno will be incremented by the scanner and then incremented again by your action. My advice: specify %option yylineno and then let flex do everything for you. You only need to declare it as extern in your bison file (as you do). And you can just add \n to the list of ignored whitespace characters.

One caveat: using yylineno directly in bison means that you will not have an exact location for syntax errors, because the bison-generated parser has usually read one lookahead token, and yylineno will already have been updated to the line number at the end of this token by the time bison notices the syntax error. Sometimes this is misleading, especially in the case of syntax errors caused by a missing token.

Some other problems:

  • It's much better style (IMHO) to use literal character tokens rather than define token names in bison and coordinate them with your flex file. If you just use literal characters, then the two files are much easier to keep in synch with each other; the grammar is more readable; and you don't need comments like

    /*OP CP = Opening Closing Parenthesis*/
    

    Instead, just use ')' in the grammar, and in the lexer you can do something like this:

    [][=+*/,(){}-]  { return yytext[0]; }
    

    Or you could even just use a default rule at the end:

    .  { return yytext[0]; }
    
  • Related to the above, and the reason why I usually choose the second option (the default rule), your lexer does not have a rule for all possible characters, and consequently the flex-provided default rule will be used. The flex-provided default rule is to just echo the invalid character to yyout. That's never what you want in an actual compiler, and the result is that input errors (or scanner bugs) are silently hidden. It's better to use a default rule like the one I suggest above and to protect yourself by using %option nodefault to avoid the flex-generated default rule. With %option nodefault, flex will give you a warning if there is any possibility that the input won't match; please don't ignore this warning.

rici
  • 234,347
  • 28
  • 237
  • 341