1

In the extremely simple example below, I want to read in the language of a single a and assure that no remaining characters come after.

File: example.y

%{
#include <stdio.h>
#include <ctype.h>
int yylex(void);
int yyerror(char *s);
%}

%token A
%token END
%token JUNK

%% /* Grammar Rules */
accept: A END { printf("language accepted!\n"); }
;
%%

File: example.in

%{
#include "ex.tab.h"
#define YY_NO_INPUT
%}
%option nounput
%%
a printf("A found\n"); return A;
<<EOF>> { printf("EOF found\n"); return END; }
. { printf("JUNK found\n"); return JUNK; }
%%

The results of compiling and running this program with the following test input file:

a

produces the following output:

A found

EOF found
language accepted!
EOF found
Error: syntax error

Because EOF is read twice, I think that's why the program is not accepting my input language. My question is, why is EOF being read twice and how do I stop it?

Also, doing the above without the EOF rule causes inputs such as

abbbb

to print the "accept" message but then immediately fail because of the excess input. All I want is either a pass or a fail which is why I'm trying to use EOF to verify I will have one result.

Daniel
  • 1,920
  • 4
  • 17
  • 35

3 Answers3

3

I was able to use the following solution to scan in an EOF with flex and pass it to Bison without getting caught up matching EOF a second time.

Make bison reduce to start symbol only if EOF is found

The solution involves using a start condition to read when an EOF is next without actually reading in EOF. Once the "initial" EOF is triggered (END can be sent to Bison), then EOF is really read-in and completes the flex/bison parse naturaully. At least that's my understanding of it.

Flex

%x REALLYEND                                              <--- declare start condition
%option noinput nounput
%%
"END"                   { return END; }
.                       { return TOK; }
<INITIAL><<EOF>>        { BEGIN(REALLYEND); return EOP; } <---- trigger start condition
<REALLYEND><<EOF>>      { return 0; }                     <---- trigger EOF
%%

Bison

%%
prog : END EOP { printf ("ok\n"); }; <-- can use EOP just like END in my example
%%
rici
  • 234,347
  • 28
  • 237
  • 341
Daniel
  • 1,920
  • 4
  • 17
  • 35
1

bison (and all yacc derivatives I know of except for lemon) will not reduce the start production unless it is followed by an EOF token. In effect, it modifies the grammar to something like this:

$accept: accept $end;
accept: A END {...}

Your END token is not the same as the built-in $end token. So bison will happily reduce the accept rule (and therefore trigger your printf, which seems to have a different message in your code than in your output, which suggests that they come from different versions of your code) but it will not reduce its own $accept rule, and consequently will report a syntax error.

It's certainly the case that flex is prepared to match <<EOF>> more than once. I believe it will continue to do so as long as you ask for more tokens, but I could be wrong; certainly, it will match twice. But that's not your problem. Your problem is that you're trying to force bison to do what it would do anyway, except that you've made it impossible for it to do that.

In short, let flex return 0 for EOF, which is what it wants to do, and trust bison to only accept input which is terminated by an EOF. That will make your code much simpler.

(The tricky part is actually recognizing a sentence which does not go to the end of the input; for example, if you are embedding one language inside another -- javascript or CSS inside of HTML, for example. In that case, you do have to play some games, and I believe that is why lemon does not insert the usual augmented start rule.)

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the great response! Letting bison handle the EOF naturally would be preferred as I feel like having an END token is kind of hacky but can you address how to mitigate the trailing character problem? Without some kind of end token, how else can I prevent printing both the accept message and an error message? Or is there a way to wait until Bison reads EOF to print some kind of accept message. – Daniel Sep 12 '13 at 05:26
  • Oh and as for the printf discrepancy, they were the same version however I changed just the text of one of them when I posted because I thought it made more sense semantically than the rather ambiguous previous message. I've updated them to match. – Daniel Sep 12 '13 at 05:28
  • I did some more searching and found a working solution. If you happen to check back I would appreciate a comment on whether the solution meets flex/bison "best practice". – Daniel Sep 12 '13 at 05:48
  • @daniel: there's a very easy way to wait until bison reads the eof and reduces the `$accept` token: wait until it returns. In your final accept production, you might want to stash the result somewhere that the bison caller can get at it, but that's about it. – rici Sep 12 '13 at 05:49
  • @Daniel: also, I'm the last person to criticize the use of start conditions to play games with flex, but I really don't think this is the place for it. It is generally the case with LALR parsing that a reduction might occur even though the lookahead token cannot be shifted; this is just one instance of that. – rici Sep 12 '13 at 05:55
  • I got so caught up in flex/bison I forgot about the .c file I had that was just making the call to yyparse! From what I could find, I can store an integer value in yylval to indicate my desired status and then access it from the .c file after the call to parse. Link: http://userpages.monmouth.com/~wstreett/lex-yacc/bison.html#SEC63 – Daniel Sep 12 '13 at 06:07
  • @Daniel: if all you care about is whether the sentence was recognized, yyparse returns that information without any work on your part. (0 is returned if the parse was successful, non-zero otherwise.) You can certainly use a global to return other information, but personally I prefer to use an extra argument to yyparse. See `%parse-param` in http://www.gnu.org/software/bison/manual/html_node/Parser-Function.html#Parser-Function – rici Sep 12 '13 at 06:13
0

The answer is 3 fold

1) You should likely for most cases allow your bison parser to handle the EOF. It will likely be the easiest way to go. But this is not always possible.

2) Handling the <<EOF>> built in flex rule has some special requirements (obligatory link to the manual).

3) A note on the point 1. If you are first find that you have a grammar that appears to require an end of file token of some kind that is ok (some variants of c require this) but it is considered bad form (some editors have settings to add newlines at the end of files which can cause conflicts).

rici
  • 234,347
  • 28
  • 237
  • 341
Nathan
  • 190
  • 7