1

Problem Description

In my yacc parser grammar, I have the following rules and corresponding actions defined (see program.y below). Parsing int X; should have the derivation type => TOK_INT and variable_list => TOK_VARIABLE, and then these match against a declaration which ends in a statment ;. However, reads this as int X and ;. That is, two separate statements. Can anyone see why?

program.y

program:
    function { exit(0); }
    ;

function:
    function line { printf("goal\n"); printtree_print($2); }
        |
        ;

line:
    statement ';' { printf("line\n"); printtree_print($1); }
    ;

statement:
    declaration { printf("declaration\n"); printtree_print($1); }
    | assignment { printf("assignment\n"); printtree_print($1); }
    ;

declaration: 
       type variable_list { printf("varlist\n"); printtree_print($2); $$ = $2;  }
       ;

type:
    TOK_INT { typeMode = typeInt; }
    ;

variable_list: 
         TOK_VARIABLE
         { $$ = node_mkVariable($1, typeMode); 

        printtree_print($$);
        }
         ; 

assignment:  
      TOK_VARIABLE TOK_ASSIGN expr
      { printf("assignment %s = expr\n", $1); 
        node_setInTable($1, $3); 
    $$ = node_getFromTable($1); }
      ;

expr:
    TOK_INTEGER { $$ = node_mkConstant($1); }
| TOK_VARIABLE { $$ = node_mkVariable($1, typeVariable); }
;
foobuzz
  • 75
  • 7

2 Answers2

3

Since 'expr' and 'assignment' are probably not germane to the problem, I omitted them from my test rig. Since you didn't provide minimal compilable code that demonstrates the problem, I created it for you:

%{
#include <stdlib.h>
#include <stdio.h>
static void yyerror(const char *str);
static int yylex(void);
static void printtree_print(int);
static int node_mkVariable(int, int);
int typeMode;
enum { typeInt };
%}
%token TOK_INT
%token TOK_VARIABLE
%%
program:
    function
        { exit(0); }
    ;

function:
        /* Nothing */
    |   function line
        { printf("goal\n"); printtree_print($2); }
    ;

line:
    statement ';'
        { printf("line\n"); printtree_print($1); }
    ;

statement:
    declaration
        { printf("declaration\n"); printtree_print($1); }
    ;

declaration: 
    type variable_list
        { printf("varlist\n"); printtree_print($2); $$ = $2;  }
    ;

type:
    TOK_INT
         { typeMode = typeInt; }
    ;

variable_list: 
    TOK_VARIABLE
    {
        $$ = node_mkVariable($1, typeMode); 
        printtree_print($$);
    }
    ; 
%%
void printtree_print(int n)
{
    printf("PT_P: %d\n", n);
}
int yylex(void)
{
    static int counter = 0;
    static int tokens[] = { TOK_INT, TOK_VARIABLE, ';', 0 };
    enum { NUM_TOKENS = sizeof(tokens) / sizeof(tokens[0]) };
    if (counter < NUM_TOKENS)
    {
        printf("Token: %d\n", tokens[counter]);
        return(tokens[counter++]);
    }
    return 0;
}
int node_mkVariable(int var, int mode)
{
    return 23 + var + mode;
}
static void yyerror(const char *str)
{
    fprintf(stderr, "Error: %s\n", str);
    exit(1);
}
int main(void)
{
    while (yyparse() == 0)
        ;
    return 0;
}

When I compile it, I get as output:

Token: 258
Token: 259
PT_P: 23
varlist
PT_P: 23
declaration
PT_P: 23
Token: 59
line
PT_P: 23
goal
PT_P: 23
Token: 0

This looks correct given the infrastructure, and shows no sign of your observed behaviour. So, you need to show us just enough extra code to reproduce your problem - so as to demonstrate that it is not an artefact of the code that you didn't supply but is a feature of your grammar.

FWIW: this was compiled on MacOS X 10.6.7 using the system provided Yacc (actually, Bison 2.3) - I got essentially the same output with 2 other variants of Yacc on my machine. The GCC was 4.2.1 (XCode 3).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Nice test rig. What you have pasted is fine - it' illustrates my grammar working as I want it. The lexer on it's own works as I want it as well, so I saw no point in posting it. Someone on irc (freenode's ##parsers channel) supposes it is due to me not making a deep copy of the string $1 in node_mkVariable($1), which subsequently modifies yacc's buffer. They think I might be storing a pointer to the buffer and the when the buffers contents change (I get the token ';'), the pointer stays the same but not the thing it's pointing to. I will look into this later when I return home. – foobuzz Apr 01 '11 at 21:48
  • @foobuzz: if you're not copying the token string, then there is a very good chance that is exactly your problem. – Jonathan Leffler Apr 02 '11 at 00:35
0

You might actually have a problem with your lexer. One way to debug would be to remove all clauses other than ones directly involved, and then add clauses one by one to see which one introduces the error.

sanjoyd
  • 3,260
  • 2
  • 16
  • 22