5

I'm writing a BNF grammar for a very simple programming language and using Flex and Bison to compile.
I only have 3 variable and constant types: real, integer, string.
My .l file has a token definition for "ID" as follows:

DIGIT [0-9]
LETTER [a-zA-Z]
ID {LETTER}({LETTER}|{DIGIT})*


My .y file has a definition for an identifier like this:

identifier:
ID;

Now, I want to use the identifier definition to build variable and constant names. But I also want to limit assignment to data of the same type (e.g., I don't want a string assigned to an integer variable). So I created a few rules to separate each kind of variable and constant:

id_variable_string:
identifier;

id_variable_integer:
identifier;

id_variable_real:
identifier;

I did the same for constants. Now, in my language I have a section for constant declaration/definition and then a section for variable declaration. That is, constants are declared at the same time as they're assigned (something like "myConstant = 123") but variables have to be declared first, and then assigned a value in the section specifically designed for statements.
E.g., if I want an integer and a string variable, I'd have to declare them first:
STRING myStrVariable;
INTEGER myIntVariable;
And then, in the zone reserved for statements, I can choose to do an assignment (a statement can be an assignment, a decision, a selection, an output, etc.):

assignment: 
        id_variable_string ASSIGN_OPERATOR literal_string
        | id_variable_string ASSIGN_OPERATOR id_const_string 
        | id_variable_string ASSIGN_OPERATOR id_variable_string 
        | id_variable_string ASSIGN_OPERATOR concatenacion  
        | id_variable_integer ASSIGN_OPERATOR id_const_integer 
        | id_variable_integer ASSIGN_OPERATOR id_variable_integer  
        | id_variable_integer ASSIGN_OPERATOR expression 
        | id_variable_integer ASSIGN_OPERATOR literal_integer
        | id_variable_real ASSIGN_OPERATOR id_variable_real
        | id_variable_real ASSIGN_OPERATOR id_const_real
        | id_variable_real ASSIGN_OPERATOR expression
        | id_variable_real ASSIGN_OPERATOR literal_real
        ;

What I intend here is to explicitly say that a string variable can only be assigned a string literal, a concatenation of strings (using +), a string constant or another string variable. The same for integer variables and then for real variables, only that they can't be assigned a concatenation but an expression instead (math operations).
Concatenation is defined as follows:

concatenation:
        id_variable_string ADD_OPERATOR id_variable_string 
        | id_variable_string ADD_OPERADOR literal_string 
        | literal_string ADD_OPERADOR id_variable_string 
        | literal_string ADD_OPERADOR literal_string
        | id_const_string ADD_OPERADOR id_const_string  
        | id_const_string ADD_OPERADOR id_variable_string 
        | id_const_string ADD_OPERADOR literal_string 
        | literal_string ADD_OPERADOR id_const_string  
        | id_variable_string ADD_OPERADOR id_const_string
        ;

And expression is defined as:

expression: 
        expression ADD_OPERATOR term
        | expression SUBST_OPERADOR term
        | term
        ;

term:
        term MULTIP_OPERATOR factor
        | term DIVISION_OPERATOR factor
        | factor
        ;

factor:     
        id_variable_integer
        | id_variable_real
        | id_const_integer
        | id_const_real
        | literal_integer
        | literal_real
        | PARENTHESIS_OPEN expression PARENTHESIS_CLOSE
        ;

Now, this is what Bison is saying:


55 assignment: id_variable_integer ASSIGN_OPERATOR id_const_integer
56 | id_variable_integer ASSIGN_OPERATOR id_variable_integer
58 | id_variable_integer ASSIGN_OPERATOR literal_integer
59 | id_variable_real ASSIGN_OPERATOR id_variable_real
60 | id_variable_real ASSIGN_OPERATOR id_const_real
62 | id_variable_real ASSIGN_OPERATOR literal_real


State 50 conflicts: 1 shift/reduce
State 76 conflicts: 14 shift/reduce
State 130 conflicts: 2 shift/reduce
State 131 conflicts: 1 shift/reduce
State 133 conflicts: 1 shift/reduce
State 134 conflicts: 1 shift/reduce
State 135 conflicts: 1 shift/reduce
State 137 conflicts: 1 shift/reduce
State 138 conflicts: 1 shift/reduce


I'm assuming something in my grammar is wrong but I'm not sure what exactly.

Vertexwahn
  • 7,709
  • 6
  • 64
  • 90
patr1c1a
  • 237
  • 4
  • 16

1 Answers1

5

You said:

So I created a few rules to separate each kind of variable and constant:

id_variable_string:
identifier;

id_variable_integer:
identifier;

id_variable_real:
identifier;

And this was your problem. There is nothing syntactically to distinguish an id_variable_string from an id_variable_integer, so you have (at least two) wasted rules. This is what it is complaining about. It has no clue when it gets an identifier whether it should be treating it as an id_variable_string or an id_variable_integer.

You have to handle the type conflicts differently — a semantic check (not a syntactic check) that the type associated with the identifier is consistent with the types of the other identifiers in the expression.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thanks, Jonathan! When you say I need to perform a semantic check, that exceeds the work done in flex and bison, right? Or there's still something I can do? – patr1c1a Nov 12 '13 at 16:16
  • 1
    You'll have a check in the actions of the Bison code that ensures that when you have, say, `add_expr: identifier '+' identifier {…code…} ;` (not very realistic, but it serves), then the code will check that the two identifiers (`$1` and `$3` in Bison-speak) identify compatible types (both strings, both integers, both reals). 'Semantic' means 'non-grammatical' (checks performed outside the scope of the grammar). Looking at `a + b`, it is syntactically correct, even if `a` is a string and `b` is a real; it is a semantic check that ensures that it is meaningful to try to do what is requested. – Jonathan Leffler Nov 12 '13 at 16:46
  • Thanks again, Jonathan. As for the grammar, I'm trying to make it work, so I removed every different "identifier" and the identifier rule and just used the token "ID" everywhere an identifier was needed. But Bison still complains about shift/reduction in this part: **assignment: ID ASSIGN_OPERATOR ID | ID ASSIGN_OPERATOR literal_integer | ID ASSIGN_OPERATOR literal_real** It seems to point out some problem in my "factor" rule (I only replaced every different "identifier" with the token "ID"). I'm reading about the shift/reduce used by Bison, but I can't figure out how to fix my grammar yet :/ – patr1c1a Nov 12 '13 at 17:29
  • 1
    Since I can't see definitions of `literal_integer` and `literal_real`, it is hard to know what is up. However, since you seem to be using the standard convention of upper-case for tokens and lower-case for rules, it appears that `literal_integer` and `literal_real` are rules, and you may have a similar problem with that to what you had with your identifiers — they are not gramatically distinguishable. You should probably fix this in the lexical analyzer; it should return LITERAL_INTEGER and LITERAL_REAL. – Jonathan Leffler Nov 12 '13 at 17:54
  • Thanks, Jonathan. I found out there was another ambiguity where ID ASSIGN_OPERATOR ID could also be formed by ID ASSIGN_OPERATOR EXPRESSION, since EXPRESSION can be reduced to form an ID. I got to fix almost every error but one. I made a different post for that one because the explanation was a bit long: http://stackoverflow.com/questions/19942710/how-can-i-force-bison-to-shift-to-resolve-a-conflict. I also posted links to the whole grammar and Bison output there. If it's not too much of a trouble and you want to take a look at it, I'd be very grateful :) – patr1c1a Nov 13 '13 at 02:15