1

I am attempting to generate a parser related to recipe ingredients. I am noticing that the order the parser handles tokens seems to follow the token's line-item order in the jison file, vs. whats defined in the EBNF grammar.

For example, parsing 6 tablespoons unsalted butter, cut into 1-inch pieces yields:

Error: Parse error on line 1:
6 tablespoons unsalted
--^
Expecting 'UNIT_NAME', 'NUMBER', 'SLASH', got 'WORD'

I would expect the grammar to see UNIT_NAME which is tablespoons before it eats a WORD. What is the right grammar approach here? I have been using the interactive Jison parser to validate the grammar states and didnt see any gotchas so far.

Jison Grammer

%lex
%options flex case-insensitive

UnitName                    [teaspoons|teaspoon|tablespoons|tablespoon|fluid ounces|fluid ounce|ounces|ounce|cups|cup|pints|pint|quarts|quart|gallons|gallon|pounds|pound|milliliters|milliliter|deciliters|deciliter|liters|liter]\b
Word                        \w+\b
NUMBER                      [1-9][0-9]+|[0-9]
CHAR                        [a-zA-Z0-9_-]

%%

\s+                      /* skip whitespace */
{NUMBER}                 return 'NUMBER'
{UnitName}               return "UNIT_NAME";
{Word}                   return 'WORD'
{CHAR}                   return 'CHAR'
"/"                      return "SLASH";
"-"                      return "HYPHEN"
","                      return "COMMA";
<<EOF>>                  return 'EOF';

/lex

/* enable EBNF grammar syntax */
%ebnf

/* language grammar */
%start ingredient
%%

ingredient
    : ingredient_format
        { return $1; }
    ;

ingredient_format
    : unit_count UNIT_NAME ingredient_name COMMA ingredient_info EOF
        { $$ = {'count': $1, 'unit': $2, 'item': $3, info: $5}; }
    | unit_count UNIT_NAME ingredient_name EOF
        { $$ = {'count': $1, 'unit': $2, 'item': $3, info: null}; }
    ;

unit_count
    : NUMBER
        { $$ = parseInt($1); }
    | NUMBER SLASH NUMBER
        { $$ = parseInt($1) / parseInt($3); }
    | NUMBER NUMBER SLASH NUMBER
        { $$ = parseInt($1) + (parseInt($2) / parseInt($4)); }
    ;

ingredient_name
    : WORD+
        { $$ = $1; }
    ;

ingredient_info
    : ""
        { $$ = ''; }
    | WORD+
        { $$ = $1; }
    ;

Gist

I created a with some text strings and a simple parser to test: https://gist.github.com/aphexddb/ddc83d57c7f1c1b96458

aphexddb
  • 76
  • 1
  • 5
  • 1
    There is a big difference between [] and () in a regex. – rici Nov 13 '14 at 03:04
  • @rici nice typo catch, however it still wont parse correctly. I replaced the [] with (). – aphexddb Nov 13 '14 at 14:39
  • It worked for me, more or less. I changed `UnitName` to `("teaspoon"|"teaspoons"|...)\b` (although the `\b` is not necessary nor good style) and then successfully parsed "6 tablespoons butter, cut into pieces". Of course, you can't parse "6 tablespoons butter, cut into 1-inch pieces" because `1` is a NUMBER and `-` is a CHAR and neither of those are valid in `ingredient_info` – rici Nov 13 '14 at 15:26

0 Answers0