Why is 'YY_DO_BEFORE_ACTION' undefined by Bison's %code requires prologue?

Question

I have a compiler project using Bison (3.8.2) and Flex (2.6.4) built using Cygwin on Windows 10. Without using the %code requires prologue, my project will build and run as expected. Once I add the %code requires prologue, I get compiler errors when calling g++ because YY_DO_BEFORE_ACTION has been undefined. The specific errors are:

lex.yy.c: In function 'int yylex()':
lex.yy.c:715:3: error: 'YY_DO_BEFORE_ACTION' was not declared in this scope; did you mean 'YY_USER_ACTION'?
lex.yy.c:858:42: error: 'YY_NEW_FILE' was not declared in this scope

I do not understand why adding %code requires causes YY_DO_BEFORE_ACTION to become undefined.

Here is a minimal reproducible example in the working state:

prologue.y

%code top{
#include <iostream>
#include <fstream>

void yyerror (char const *s);
}

%union {
    char str[100];
    int integer_value;
    double double_value;
}

%code {
#include <string>
#include "prologue.h"
#include "prologue.tab.h"
}

/* declare tokens */
%token WORD
%token EOL

%define parse.error detailed

%%

program: WORD eol
 ;

eol: EOL
| eol EOL
;

%%
int main(int argc, char **argv)
{
  yydebug = 1;
  std::cout << "Call yyparse." << std::endl;
  yyparse();
  std::cout << "Done yyparse." << std::endl;

  return 0;
}

void yyerror(char const *s)
{
  fprintf(stderr, "error: %s\n", s);
}

prologue.l

%{
#include <stdio.h>
#include "prologue.tab.h"
%}

%%
"word"   { return WORD; }
\r\n   {  return EOL; }
\n     {  return EOL; }
[ \t]  { /* ignore whitespace */ }
.      { printf("Mystery character %c\n", *yytext); }
%%

Makefile

all: prologue

prologue: prologue.l prologue.y
    bison -d -v --debug prologue.y -Wcounterexamples
    flex --header-file=prologue.h prologue.l
    g++ -o $@ prologue.tab.c lex.yy.c -lfl

clean:
    rm -f prologue.exe lex.yy.c prologue.tab.c prologue.tab.h prologue.h

In order to bring this minimal reproducible example to a non-working state, just change the very first %code block to a %code requires in prologue.y:

%code requires {
#include <string>
#include "prologue.h"
#include "prologue.tab.h"
}

@rici Thanks, I have made that change now but I am still getting the same error about YY_DO_BEFORE_ACTION. — afarley, Jun 15 '22 at 15:04
Sorry, I didn't notice that the other file you're including, `prologue.h`, is generated by the lexer. In general, that's not going to work either, and certainly not in a `%code requires` block. Why do you think you need it? — rici, Jun 15 '22 at 15:16
@rici The specific contents of the '%code requires' block have been cut down from my original project, where it included a C++ class for use by my parser and lexer. The contents of '%code requires' are basically just a placeholder in this example. — afarley, Jun 15 '22 at 15:21
@rici Ok it's building successfully now after moving out prologue.h from the %code requires block. I will have to go back to the textbook to see why I thought adding those headers was a good idea - maybe I just did it blindly. Thanks! — afarley, Jun 15 '22 at 15:23

score 0 · Answer 1 · answered Jun 16 '22 at 02:14

The basic problem here is that you #include the bison-generated header file (prologue.tab.h) and the flex-generated header file (prologue.h) in your Grammar file, and more specifically in a %code requires section.

It's neither necessary nor advisable to include the bison-generated header file in the bison-generated C file. Bison doesn't require you to generate a header file, and consequently it needs to put everything which it might need in the generated source file. If you do also include the generated header file, you could end up with duplicated #defines and other related errors, because not all of Bison's generated #defines are guarded with #ifndefs. That's less of a concern than it was with earlier versions of Bison, but there are still some circumstances in which it causes errors.

It's particularly useless to put #include "parser.tab.h" in a %code requiresor%code providesblock, because both of those blocks are copied into the generated header file. That would make the header file#include` itself. Since Bison 3, the header file has an include guard, so that doesn't cause infinite include recursion. But still.

But that's not creating the errors you report. Those come from the fact that you're effectively including the flex-generated header file in the flex source, and that's much more problematic. Again, that comes from putting the #include statement in a %code requires block, from which it is copied into the bison-generated header file, which in turn is #included in the flex source.

Like Bison, Flex does not require you to generate a header file at all, so it has to put all necessary declarations and macro definitions into the generated source file. In fact, the generated Flex header file is simply an excerpt of the generated code file; the Flex skeleton is decorated with %ok-for-header and %not-for-header markings, and the Flex code generator uses those annotations to decide whether to write generated code only into the .c file or into both the .c and the .h file.

So, again, unnecessary and unadvisable, and likely to lead at least to compiler warnings. But there is an additional problem, which is the one you're running into: Flex tries to clean up all of its macro definitions at the end of the generated code.

It does that because a lot of programmers are uncomfortable with multifile C projects, so they #include the generated Flex source directly into their application rather than trying to figure out how to integrate it into their build procedure. So at the end of the generated Flex source, there is a stream of #undef directives for every internal macro defined. That includes YY_DO_BEFORE_ACTION and YY_NEW_FILE, among many others. Some of these macros are also used in the header file. So the #undef directives are part of a skeleton section which is copied into both .c and .h files, at the end of each file.

So when you #include "prologue.h" in prologue.l, the consequence is that Flex internal macros, will be undefined much too early. That's the error you're seeing.

Of course, you don't explicitly #include "prologue.h" in prologue.l, and you might well not have intended to do so at all. But it's there because the line is in a %code requires block and you do #include "prologue.tab.h" in prologue.l (as you must, in order for the lexical scanner to see the definitions for tokens). And that's why the error suddenly showed up when you changed the %code block into a %code requires block.

But you shouldn't include the generated flex header in the generated bison code, even if you don't put it into a %code requires or %code provides block.

Since the scanner is a client of the parser -- in other words, yyparse calls yylex -- it does seem logical to include the scanner's header file in the parser's source file. But it won't work in general, because there's a dependency inversion between the parser and the scanner. Even though the parser calls the scanner, the scanner depends on symbols and type declarations defined by the parser. These include not only the enum constants which define token identifiers, but also the data types used to implement the parser's semantic and location values. The scanner cannot be compiled without these things: it needs to know the type of yylval (and yylloc if it is producing location information), and it needs to know the enumeration values to return for each token type.

So if the parser also needed to know about the scanner, that would create a circular dependency. The usual way to solve circular dependencies is to abstract the circularly-required definitions into a common, self-contained header file, which itself contains enough forward declarations to resolve any circular internal dependencies. But since Flex and Bison are independent software products which do not depend on each other (you could use Flex with a different parser generator, or Bison with a different scanner generator), thinking that they will cooperatively create a shared header file is probably an unreasonable expectation. In any case, they don't, so any coordination has to be done by the programmer.

In the simplest (and traditional) scenario, in which all shared state is in global variables, the only thing the parser needs to know about is the prototype for yylex, and that prototype is dead simple: yylex takes no arguments and produces an int. In the version of C prevalent at the time that lex and yacc were originally designed, that meant that yylex didn't have to be declared at all, since undeclared functions were assumed to take whatever arguments they were given, if any, and return an int. Of course, that hasn't been the case for some 30 years, and these days it is necessary to add a declaration for yylex (int yylex();) to your .y files. In many cases, that's the only useful thing you would find in the scanner's header file, making the header file unnecessary.

A lot of other things have changed in those 30 years, one of which is that it is now much more common to expect libraries to use some mechanism other than global variables to maintain internal state. Both Bison and Flex can produce reentrant modules which are (mostly) free of globals. That's all to the good, but it brings into focus the problem of the circular dependency between the parser and the scanner. If you're planning to pursue that route, you might want to take a look at this annotated example, which goes into more detail. Alternatively, you could try using the C++ interfaces, or (my personal preference) avoid the circular dependency by inverting the call relationship, using a push parser.

Why is 'YY_DO_BEFORE_ACTION' undefined by Bison's %code requires prologue?

1 Answers1