0

I'm developing a compiler for a C-like language using Bison and Flex. The compiler, at the moment, is able to recognize a language with declaration, assignment and print statements and arithmetic and logic expressions (using only int variables). It generates a 3AC (and some instruction for managing memory). This is my Bison code:

%{

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
#include "list.h"

int yylex();
void yyerror(char *s);

TList list = NULL;
int i=0;

char* tmp() {
    char* t = (char*)malloc(sizeof(char*));
    sprintf(t, "t%d", i);
    i++;
    return t;
}

%}

%union {
    int number;
    char* identifier;
}

%token <number> NUM
%token <identifier> ID
%token PRINT INT ENDFILE 
%left '+' '-'
%left '*' '/'
%right UMINUS
%left OR
%left AND
%right NOT
%nonassoc EQ LT GT LE GE NE
%type <identifier> expr
%type <identifier> comp
%type <identifier> bexpr


%%
program :   lstmt ENDFILE               { return 0; }
            ;

lstmt       : lstmt stmt ';'
            | stmt ';'
            | lstmt openb lstmt closeb
            | openb lstmt closeb
            ;

openb       : '{'                       { printf("list = insertElement(list);\n"); }
            ;

closeb      :  '}'                      { printf("list = removeElement(list);\n"); }
            ;

stmt        : INT ID                    { printf("addVar(\"%s\", &list->table);\n", $2); }
            | INT ID '=' NUM            {
                                            printf("addVar(\"%s\", &list->table);\n", $2);
                                            printf("setVarList(\"%s\", %d, list);\n", $2, $4);
                                        }
            | ID '=' expr               { printf("setVarList(\"%s\", %s, list);\n", $1, $3); }
            | PRINT '(' ID ')'          { printf("printf(\"%%s: %%d\\n\", \"%s\", getVarList(\"%s\", list));\n", $3, $3); }
            | ID '=' bexpr              { printf("setVarList(\"%s\", %s, list);\n", $1, $3); }
            ;

bexpr       : bexpr OR bexpr            {
                                            $$ = tmp();
                                            printf("%s = %s || %s;\n", $$, $1, $3);
                                        }
            | bexpr AND bexpr           {
                                            $$ = tmp();
                                            printf("%s = %s && %s;\n", $$, $1, $3);
                                        }
            | expr OR bexpr             {
                                            $$ = tmp();
                                            printf("%s = %s || %s;\n", $$, $1, $3);
                                        }
            | expr AND bexpr            {
                                            $$ = tmp();
                                            printf("%s = %s && %s;\n", $$, $1, $3);
                                        }
            | bexpr OR expr             {
                                            $$ = tmp();
                                            printf("%s = %s || %s;\n", $$, $1, $3);
                                        }
            | bexpr AND expr            {
                                            $$ = tmp();
                                            printf("%s = %s && %s;\n", $$, $1, $3);
                                        }
            | NOT bexpr                 {
                                            $$ = tmp();
                                            printf("%s = !%s;\n", $$, $2);
                                        }
            | '(' bexpr ')'             { $$ = $2; }
            | comp                      { $$ = $1; }
            ;

comp        : expr LT expr              {
                                            $$ = tmp();
                                            printf("%s = %s < %s;\n", $$, $1, $3);
                                        }
            | expr LE expr              {
                                            $$ = tmp();
                                            printf("%s = %s <= %s;\n", $$, $1, $3);
                                        }
            | expr GT expr              {
                                            $$ = tmp();
                                            printf("%s = %s > %s;\n", $$, $1, $3);
                                        }
            | expr GE expr              {
                                            $$ = tmp();
                                            printf("%s = %s >= %s;\n", $$, $1, $3);
                                        }
            | expr EQ expr              {
                                            $$ = tmp();
                                            printf("%s = %s == %s;\n", $$, $1, $3);
                                        }
            | expr NE expr              {
                                            $$ = tmp();
                                            printf("%s = %s != %s;\n", $$, $1, $3);
                                        }
            | expr AND expr             {
                                            $$ = tmp();
                                            printf("%s = %s && %s;\n", $$, $1, $3);
                                        }
            | expr OR expr              {
                                            $$ = tmp();
                                            printf("%s = %s || %s;\n", $$, $1, $3);
                                        }
            | NOT expr                  {
                                            $$ = tmp();
                                            printf("%s = !%s;\n", $$, $2);
                                        }
            ;

expr        : expr '+' expr             {  
                                            $$ = tmp();
                                            printf("%s = %s + %s;\n", $$, $1, $3);
                                        }
            | expr '-' expr             { 
                                            $$ = tmp();
                                            printf("%s = %s - %s;\n", $$, $1, $3);
                                        }
            | expr '*' expr             {
                                            $$ = tmp();
                                            printf("%s = %s * %s;\n", $$, $1, $3);
                                        }
            | expr '/' expr             {
                                            $$ = tmp();
                                            printf("%s = %s / %s;\n", $$, $1, $3);
                                        }
            | '(' expr ')'              { $$ = $2; }
            | '-' expr %prec UMINUS     { 
                                            $$ = tmp();
                                            printf("%s = -%s;\n", $$, $2); 
                                        }
            | ID                        { 
                                            $$ = tmp();
                                            printf("%s = getVarList(\"%s\", list);\n", $$, $1);
                                        }
            | NUM                       {
                                            $$ = tmp();
                                            printf("%s = %d;\n", $$, $1);
                                        }
            ;

%%

int main () {
    list = insertElement(list);
    if(yyparse() !=0)
        fprintf(stderr, "Abonormal exit\n");

    fprintf(fopen("temp.h", "w"), "#include \"list.h\"\n\nTList list = NULL;\nint t" );
    for(int j=0; j<i-1; j++) {
        fprintf(fopen("temp.h", "a"), "%d, t", j);
    }
    fprintf(fopen("temp.h", "a"), "%d;", i-1);
    return 0;
}

void yyerror (char *s) {
    fprintf(stderr, "Error: %s\n", s);
}

As you can see the grammar for logic expression is a little bit complex, but the compiler does what it should. The behavior is C-like so integer values can be used in AND/OR/NOT.

My idea for the grammar was this:

bexpr       : bexpr OR bexpr            
            | bexpr AND bexpr
            | NOT bexpr
            | '(' bexpr ')'
            | comp
            | expr
            ;

comp        : expr LT expr
            | expr LE expr
            | expr GT expr
            | expr GE expr
            | expr EQ expr
            | expr NE expr
            ;

But in this way I get two conflicts, 1 shift/reduce and 1 reduce/reduce. There's a way to simplify the grammar?

forzalupi1912
  • 63
  • 1
  • 4
  • `comp` and `expr` start with the same nonterminal - `expr`, so the parser doesn't know how to reduce a `bexpr` that starts with `expr`: treat it as `expr` or `comp` – ForceBru May 20 '20 at 11:54
  • Yes, I know that, but is there a way to use a grammar like the second one? Before using the implemented grammar I did a couple of tries (the one above and another one similar: `comp->expr` instead of `bexpr->expr`) but with no result. I really don't like the grammar I used and I would like to write a less complex grammar. – forzalupi1912 May 20 '20 at 14:04

1 Answers1

1

My advice is to not try to distinguish grammatically between bexpr and expr. It cannot really be done accurately because you allow variables to be used as boolean values. Your current grammar is a valiant effort, to be sure, but when you add conditional statements to your grammar, you will find that

if ((a)) ...

will be flagged as a syntax error (assuming the syntax of a conditional statement is C-like: "if" '(' bexpr ')' stmt | "if" '(' bexpr ')' stmt "else" stmt). And the attempt to ban use of arithmetic expressions as arguments to boolean operators is easily circumvented, because a AND (1 + -1) is a valid bexpr. One might also ask why (a < b) == (b < c) should not be valid syntax. Granted, it's a bit obfuscated, but it's a convenient way to write "b is between a and c".

If you really want to disallow the use of arithmetic operations as arguments to the boolean operators AND, OR and NOT, you can improve your grammar by creating two parallel hierarchies, or by simply marking the type of the expression as part of its semantic value, and doing the check in each semantic action. The advantage of the second option is two-fold: it simplifies the grammar, eliminating duplication, and it provides the possibility of much more precise error messages ("attempt to use arithmetic expression as boolean value" instead of "syntax error").

rici
  • 234,347
  • 28
  • 237
  • 341
  • I don't understand why I'll get a syntax error when I'll add if-else to my grammar. I can be wrong but I'm pretty sure that in C: `int a = 20; if(a) {...}` works fine because the "a" is considered as true (because is greater than 1). I don't want to ban this behavior because as I said my language acts as C. I would only like to simplify the grammar, if it is possible. If it isn't I'll go on and try to add if-else rules, checking if it works – forzalupi1912 May 20 '20 at 19:28
  • @forzalupi1912: `if(a)...` will work fine. `if((a))...` won't, because `(a)` isn't a `bexpr`. – rici May 20 '20 at 19:30
  • Ok, I understood. I have to implement the if-else to verify, but I think you're right and that's why I want the `expr` nonterminal in the rules of `comp` or `bexpr`. There's no way to do that? I can't figure it out. I thought, the first times, to put all rules in `expr`, not using the other two nonterminals, but I would rather not, I would like to keep the code (and the grammar) the way it looks now. – forzalupi1912 May 20 '20 at 21:31
  • If you want to try to separate bexprs from exprs, your grammar will be complicated and there will be lots of corner cases. That's why I suggest you don't try. If you want to ban certain expressions which would otherwise be valid because your grammar lets expr be an operand of a boolean operator, then you can do so in a semantic action, as I said (and then you get good error messages). It's easier if you build an AST rather than going directly to TAC, but it can be done either way. But there's a reason why few languages do that. – rici May 20 '20 at 21:38
  • @forzalupi1912: If you want to try to provide a clear list of what expressions you want to allow and what you want to disallow, we could try to produce a grammar. But it won't be pretty. – rici May 20 '20 at 21:39
  • Sure, as I said before I want my language to act exactly like C. I have only integer variables, if an int is used as condition (if or while) or as part of a boolean expression the program have to work. – forzalupi1912 May 29 '20 at 18:23
  • @forzalupi1912: If you want it to be just like C, then you only have one type of expression. So get rid of bexprs, as suggested. – rici May 29 '20 at 19:23
  • Ok, I hoped there was another way, but unfortunately there's not. Thank you – forzalupi1912 May 30 '20 at 12:56