4

I need to have proper error messages for syntax errors for a grammar I'm writing. I've figured out that I can define a rule (? not sure about the terminology) for newlines in the flex file that increments a line-number counter, and I can use that in yyerror(const char*). However, I also need to know the exact position where the error occurred to have better error messages. This is what I would want the error messages to look like:

Syntax error on line X:
SOME ERRONEOUS TEXT ON LINE X
_______________^
Expected other text.

How could I get the column information as well as the text on the erroneous line?

Thank you in advance.

greetings
  • 43
  • 1
  • 4

1 Answers1

6

Output Unexpected and Expected Tokens

Just with using

#define YYERROR_VERBOSE 1

yyerror outputs already something like

syntax error, unexpected '+', expecting NUM or '('  

Print Line Number

To print the current line number you can make use of yylineno. You need to declare it with

extern int yylineno;

in the .y file.

In the .l flex file you need to add:

%option yylineno

Print Column

To get column information, you must track the columns in the lexer file. So after you have read a token, you can simply add the length of the token (e.g. by using strlen(yytext)). For error reporting, you are interested in the column where the token starts, so you need a second variable that is set and remembers the column position before reading the token.

You could use a simple macro for it:

#define HANDLE_COLUMN column = next_column; next_column += strlen(yytext)

Print Current Input Line

To print the current input line, you must track it yourself. You can read lines from yyin yourself and use this data in the lexer by defining the macro YY_INPUT accordingly. There is this nice answer https://stackoverflow.com/a/43303098 which explains how it works.

The author also shows an example of how the current column can be determined using the macro YY_USER_ACTION.

Simple Example

A simple, self-contained example of a calculator that can handle addition and subtraction could look like this

With an input 5+3+2+1 it gives as output:

5+3+2+1
=11

A erroneous input such as '5+2++1' results as output:

error: syntax error, unexpected '+', expecting NUM or '(' in line 3, column 5
5+2++1
____^

calc.l

%{
    #include "y.tab.h"
    extern int yylval;
    static int next_column = 1;
    int column = 1;

    #define HANDLE_COLUMN column = next_column; next_column += strlen(yytext)

    char *lineptr = NULL;
    size_t n = 0;
    size_t consumed = 0;
    size_t available = 0;

    size_t min(size_t a, size_t b);
    #define YY_INPUT(buf,result,max_size) {\
        if(available <= 0) {\
            consumed = 0;\
            available = getline(&lineptr, &n, yyin);\
            if (available < 0) {\
                if (ferror(yyin)) { perror("read error:"); }\
                    available = 0;\
                }\
        }\
        result = min(available, max_size);\
        strncpy(buf, lineptr + consumed, result);\
        consumed += result;\
        available -= result;\
    }
%}

%option noyywrap noinput nounput yylineno

%%

[\t ]+   { HANDLE_COLUMN; }
[0-9]+   { HANDLE_COLUMN; yylval = atoi(yytext);  return NUM; }
\n       { HANDLE_COLUMN; next_column = 1; return '\n'; }
.        { HANDLE_COLUMN; return yytext[0]; }

%%

size_t min(size_t a, size_t b) {
    return b < a ? b : a;
}

calc.y

%{
    #include <stdio.h>
    int yylex(void);
    void yyerror(const char *s);
    extern int yylineno;
    extern int column;
    extern char *lineptr;
    #define YYERROR_VERBOSE 1
%}

%token NUM
%left '-' '+'
%left '(' ')'

%%
LINE:                   { $$ = 0; }
       | LINE EXPR '\n' { printf("%s=%d\n", lineptr, $2); }
       | LINE '\n'
       ;


EXPR:    NUM            { $$ = $1; }
     |   EXPR '-' EXPR  { $$ = $1 - $3; }
     |   EXPR '+' EXPR  { $$ = $1 + $3; }
     |   '(' EXPR ')'   { $$ = $2; }
     ;


%%

void yyerror(const char *str)
{
    fprintf(stderr,"error: %s in line %d, column %d\n", str, yylineno, column);
    fprintf(stderr,"%s", lineptr);
    for(int i = 0; i < column - 1; i++)
        fprintf(stderr,"_");
    fprintf(stderr,"^\n");
}

int main()
{
   yyparse();
   free(lineptr);
}

Build Command

Depending on your system, a build command would look similar to the following:

flex calc.l  
yacc -d calc.y
cc -Wextra -Wall lex.yy.c y.tab.c 
Stephan Schlecht
  • 26,556
  • 1
  • 33
  • 47
  • `YYERROR_VERBOSE` is a bison extension, so only works if your "yacc" is really bison. If you are using bison, there's builtin support for locations in the parser as well with `@` – Chris Dodd Jun 01 '20 at 07:57
  • @ChrisDodd Yep, one could indeed use %locations. On the lexer side then the %option bison-locations. With YY_USER_ACTION you could then fill yylloc. The result is a yyerror variant where the first argument is of type YYLTYPE. By default, this would give us the desired start and end of row and column of the token (see https://www.gnu.org/software/bison/manual/html_node/Location-Type.html). – Stephan Schlecht Jun 01 '20 at 11:40
  • But the question also asks for the output of the problematic line. For this, however, you would still have to manage the input lines in the lexer yourself. Of course it is possible that there is a bison/flex method for this, which I am just not aware of. Is there such a thing? – Stephan Schlecht Jun 01 '20 at 11:40