You must use the header file generated by bison. You could #include
it into your own header file (although that's of little practical value) but you cannot attempt to write it yourself and hope that it will always be correct.
In this case, myparser.h
contains
enum yytokentype {
SP = 259,
LETTER = 260,
CRLF = 261
};
and those are the tokentype numbers which yylex
will return, since in your flex file, you #include <myParser.h>
.
However, the file bison.tab.h
, generated by bison (and included textually in bison.tab.c
) has different values:
enum yytokentype
{
SP = 258,
CRLF = 259,
LETTER = 260
};
As it happens, LETTER
has the same code in both files, but the other two codes differ. In particular, the scanner will return 261 when it sees a newline, but the parser will be expected a token with type number 259. when it receives the 261, it complains:
ERROR in line 2 : syntax error, unexpected $undefined, expecting CRLF.
(For bison, code 261 doesn't correspond to anything,so it reports it as $undefined
.)
That's a different error message from the one you report in your question, which might be the result of different bison versions numbering tokens in different orders, or it might be simply a copy-and-paste issue.
The bottom line is that you should always put
#include "bison.tab.h"
into your .l
file (changing the name as appropriate for your project, but always using the file generated by bison), rather than (or in addition to) your own header file. (If you also insert your own header file, it should not attempt to define token values, of course. The reason to include your own header file would be to declare prototypes for your own external functions which are being used by the scanner actions.)
On the whole, there are very few cases in which whitespace (spaces and newlines) should be passed on to the parser. (The exception would be a language where statements were definitely terminated by newlines, rather than by semi-colons, for example. Even then, you would not want to pass spaces onto the parser.) Dealing with whitespace in the parser creates a lot more work than is necessary; confuses the grammar; and can lead to unnecessary shift-reduce conflicts.
Consider the simplified grammar production:
decl: type SP id SEMICOLON
That will match int a;
, as expected. But it will not match any of the following:
int a;
int a ;
int a;
All of the above are likely to show up in valid programs, so the pickiness about the whitespace will be perceived as a problem by your users. (And making the grammar more flexible will be really a pain.)
Furthermore, you might think of putting that in a wider context:
program: %empty
| program decl CRLF
But now your parser will reject blank lines, further annoying your users. And it will also reject
int a; int b;
which might make some people wonder why the semicolon is even required.
And watch out for the following error, adapted from the edit history:
prog: stmt
stmt: decl more
more: %empty | stmt CRLF more
This will probably never successfully parse a program, because all program text end with a newline, but the grammar only allows newlines between statements. So the newline at the end of the file will cause a syntax error as the parser desperately tries to find another statement.
(The above snippet was presumably originally written under the misapprehension that eliminating left-recursion is a good idea. It is not, at least if you are using LR parsers such as bison. Bison loves left-recursion and finds right-recursion tedious at best. Many grammars are also more readable when written left recursively.)