0

I'm creating a bison/flex compiler and I have a problem. I added %glr-parser but the problem still exists. I have:

There is a simple example which represents my problem

.y file:

%{ 
    #include <stdio.h> 
    #include <stdlib.h> 
    extern FILE *yyin; 
    extern int yylex(); 
    int line=1; 
    int error=0; 
    #define YYERROR_VERBOSE 

    void yyerror(const char *msg) 
    { 
        error = 1; 
        printf("ERROR in line %d : %s.\n", line, msg); 
    } 
%}

%start programme
%token SP
%token CRLF
%token LETTER
%%

programme : id CRLF;

id : LETTER;

%%

int main(int argc, char *argv[])
{
    if(argc == 2) yyin = fopen(argv[1], "r");
    else if(argc < 2){
        printf("No file found.\n");
        return 0;
    } else printf("Only one file is permitted.\n");

    yyparse();
    if(error == 0) printf("Finished at %d line.\nNo errors!\n",line); 
    return 0; 
} 

.l file

    %{
        #include <stdio.h>
        #include <string.h>
        #include <stdlib.h>
        #include "myParser.h"
        extern int line;
    %}

    %%

    "\n" {line++; return CRLF;}

    " " {return SP;}

    [a-zA-Z] {return LETTER;} 

%%

.h file

enum yytokentype {
     SP = 259,
LETTER = 260,
CRLF = 261
}

My programme gets a .txt file:

file_correct.txt contains: A On my terminal, I write:

bison -d bison.y
flex myParser.l
gcc bison.tab.c lex.yy.c -lfl -o a
./a file_correct.txt

-> ERROR in line 1 : syntax error, unexpected $undefined, expecting LETTER.

The input A\n should be correct. Instead i have this message.. Can you help me?

  • You're not showing `rest_dec`, which seems relevant. (Did bison report parsing conflicts? Was that why you added `%glr-parser`? If there are no conflicts, a GLR parser won't change anything.) – rici May 27 '17 at 23:35
  • You also don't show your flex description, although you incorrectly tagged the question [tag:flex] (the correct tag for the flex scanner generator is [tag:flex-lexer]; I fixed it). Certainly, it is possible that the error is in your scanner. Did you verify that the correct tokens are scanned? – rici May 27 '17 at 23:37
  • Sorry, I edited my post :) There are 2 shift/reduce conflicts, but I also added %expect 2 with the %gls-parser and nothing happened. I believe that it reads the correct tokens... I created a hand-writing parse tree for an example (such as int A;) and it seems correct. I don't know... It's the first time I'm doing this.. – anna_jennifer May 28 '17 at 08:10
  • To me it seems that rest_dec should allow to be empty, as it should be in the case int A; – jpmuc May 28 '17 at 08:18
  • I have it to be empty.. rest_dec : (empty) | COMMA decl rest_dec; but also, when i try int A,B; it shows the same message – anna_jennifer May 28 '17 at 08:39
  • Please post a [mcve]. – melpomene May 28 '17 at 09:31
  • I did it just now :) – anna_jennifer May 28 '17 at 10:06
  • `bison.y:1.1-9: error: syntax error, unexpected identifier: programme : stmt;` – melpomene May 28 '17 at 10:10
  • That's new.. :/ Do you want me to add the tokens here and the .h file? – anna_jennifer May 28 '17 at 10:54
  • Read the description of a [mcve] again. The idea is not to show a small excerpt of your program. The idea is to provide a small complete program which compiles and exhibits the same problem. That might seem like a lot of work just to solve what might be a simple problem, but it is an essential debugging technique. At its root is the idea that you should start debugging as early as possible. Don't write an entire massive program before you try anything out. Build your program up in small pieces and test each piece as you write it. – rici May 28 '17 at 14:41
  • Ok, sorry! I think now I did the post correctly.. – anna_jennifer May 28 '17 at 16:06

1 Answers1

3

You must use the header file generated by bison. You could #include it into your own header file (although that's of little practical value) but you cannot attempt to write it yourself and hope that it will always be correct.

In this case, myparser.h contains

enum yytokentype {
    SP = 259,
    LETTER = 260,
    CRLF = 261
};

and those are the tokentype numbers which yylex will return, since in your flex file, you #include <myParser.h>.

However, the file bison.tab.h, generated by bison (and included textually in bison.tab.c) has different values:

  enum yytokentype
  {
    SP = 258,
    CRLF = 259,
    LETTER = 260
  };

As it happens, LETTER has the same code in both files, but the other two codes differ. In particular, the scanner will return 261 when it sees a newline, but the parser will be expected a token with type number 259. when it receives the 261, it complains:

ERROR in line 2 : syntax error, unexpected $undefined, expecting CRLF.

(For bison, code 261 doesn't correspond to anything,so it reports it as $undefined.)

That's a different error message from the one you report in your question, which might be the result of different bison versions numbering tokens in different orders, or it might be simply a copy-and-paste issue.

The bottom line is that you should always put

#include "bison.tab.h"

into your .l file (changing the name as appropriate for your project, but always using the file generated by bison), rather than (or in addition to) your own header file. (If you also insert your own header file, it should not attempt to define token values, of course. The reason to include your own header file would be to declare prototypes for your own external functions which are being used by the scanner actions.)

On the whole, there are very few cases in which whitespace (spaces and newlines) should be passed on to the parser. (The exception would be a language where statements were definitely terminated by newlines, rather than by semi-colons, for example. Even then, you would not want to pass spaces onto the parser.) Dealing with whitespace in the parser creates a lot more work than is necessary; confuses the grammar; and can lead to unnecessary shift-reduce conflicts.

Consider the simplified grammar production:

decl: type SP id SEMICOLON

That will match int a;, as expected. But it will not match any of the following:

int  a;
int a ;
  int a;

All of the above are likely to show up in valid programs, so the pickiness about the whitespace will be perceived as a problem by your users. (And making the grammar more flexible will be really a pain.)

Furthermore, you might think of putting that in a wider context:

program: %empty
       | program decl CRLF

But now your parser will reject blank lines, further annoying your users. And it will also reject

int a; int b;

which might make some people wonder why the semicolon is even required.

And watch out for the following error, adapted from the edit history:

prog: stmt
stmt: decl more
more: %empty | stmt CRLF more

This will probably never successfully parse a program, because all program text end with a newline, but the grammar only allows newlines between statements. So the newline at the end of the file will cause a syntax error as the parser desperately tries to find another statement.

(The above snippet was presumably originally written under the misapprehension that eliminating left-recursion is a good idea. It is not, at least if you are using LR parsers such as bison. Bison loves left-recursion and finds right-recursion tedious at best. Many grammars are also more readable when written left recursively.)

rici
  • 234,347
  • 28
  • 237
  • 341