LEX: code to read a specific grammar error

Question

Very new to lex. This is for project for Prog. Langs. class

Consider a language built over the following grammar:

<program> ::= <statement> | <program> <statement>
<statement> ::= <assignStmt> | <ifStmt> | <whileStmt> | <printStmt>
<assignStmt> ::= <id> = <expr> ;
<ifStmt> ::= if ( <expr> ) then <stmt>
<whileStmt> ::= while ( <expr> ) do <stmt>
<printStmt> ::= print <expr> ;
<expr> ::= <term> | <expr> <addOp> <term>
<term> ::= <factor> | <term> <multOp> <factor>
<factor> ::= <id> | <number> | - <factor> | ( <expr> )
<id> ::= <letter> | <id> <letter>
<letter> ::= a | b | c | d | e | f | g | h | i | j
| k | l | m | n | o | p | r | s | t
| u | v | w | x | y | z
<number> ::= <digit> | <number> <digit>
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<addOp> ::= + | -
<multOp> ::= * | / | %

Implement a lex-based C program that scans for all the tokens of the language (keywords, identifiers, numbers, operators, and so on).

My problem is I get "l7t2.l:32: unrecognized rule" error. I believe it stems from the declaration of "word" above but not sure how to fix it.

Heres my lex file, l7t2.l

%option noyywrap

%{
#include "l7t2.h"
int totDol = 0;
int *outword;
%}

digit [0-9]
number {digit}*

letter  [a-zA-Z]
word    ({letter}{[a-zA-Z0-9]}+)

%%

"if" {return IF;}
"then" {return THEN;}
"while" {return WHILE;}
"do" {return DO;}
"+" {return PLUSOP;}
"-" {return MINUSOP;}
"*" {return MULTOP;}
"/" {return DIVOP;}
"%" {return MODOP;}
";" {return SEMICOLON;}
"=" {return EQUAL;}
"print" {return PRINT;}

[ \t\n]+        ;

{word} {strcpy(outword, yytext);}

\${number}  {totDol = 0; totDol += strtod(yytext+1, NULL); return totDol;}

%%

The h file and c file are very simple as well..... – WallofKron Mar 15 '16 at 05:56 — WallofKron, Mar 15 '16 at 05:56
{word} {strcpy(outword,yytext);} – WallofKron Mar 15 '16 at 06:30 — WallofKron, Mar 15 '16 at 06:30

user207421 · Answer 1 · 2016-03-15T06:51:02.120

1

word    ({letter}{[a-zA-Z0-9]}+)

The problem is here. {} is used to introduce prior definitions only. It should be:

word    ({letter}[a-zA-Z0-9]+)

On line 32, surely you should be returning a value from that rule?

NB You can get rid of all the single-special-character rules and have a final cover-all rule:

. return yytext[0];

This also means you can use the special characters directly in the grammar, e.g. '+' instead of PLUSOP. It also saves you from having to handle illegal characters at all in the lexer: the parser does it.

edited Mar 15 '16 at 06:51

answered Mar 15 '16 at 06:38

user207421

305,947
44
307
483

also, should the .return yytext[0]; be in the third or second section? – WallofKron Mar 15 '16 at 06:45
It should be the last action rule before the final `%%`. – user207421 Mar 15 '16 at 06:50

LEX: code to read a specific grammar error

1 Answers1