0

It's been a while since I've done any work with Flex or Bison, going back to college. However, I've been trying to roll my own light-BBCode parser using Jison for fun as a weekend endeavor.

My problem involves an odd issue in which I'm told the parser is looking for a token it hasn't found the prerequisite tokens for. I don't think I'm explaining that well (or that I understand what's actually happening), so here's the code:

%lex
%%

\s+         /* Consume whitespace */
"[b"        {return 'BOLD';}
"[i"        {return 'ITAL';}
"[s"        {return 'STRIKE';}
"[url="     {return 'URLEQ';}
"[url"      {return 'URL';}
"[img"      {return 'IMG';}
"[quote"    {return 'QUOT';}
"[code"     {return 'CODE';}
"[style"\s+"size="    {return 'STYLSIZ';}
"[style"\s+"color="   {return 'STYLCOL';}
"[color="    {return 'COL';}
"[list"     {return 'LIST';}
"[table"    {return 'TABLE';}
"[tr]"       {return 'TROW';}
"[td]"       {return 'TDEL';}

"[*]"       {return 'LITEM';}

"]"         {return 'CLOSE';}

"[/b]"      {return 'BOLDEND';}
"[/i]"      {return 'ITALEND';}
"[/s]"      {return 'STRIKEEND';}
"[/url]"    {return 'URLEND';}
"[/quote]"  {return 'QUOTEND';}
"[/code]"   {return 'CODEEND';}
"[/style]"  {return 'STYLEND';}
"[/color]"  {return 'COLEND';}
"[/list]"   {return 'LISTEND';}
"[/table]"  {return 'TABLEEND';}
"[/tr]"     {return 'TROWEND';}
"[/td]"     {return 'TDELEND';}
<<EOF>>     {return 'EOF';}
[a-zA-Z]+    {return 'NTOK';}

/lex

%start EXPR

%%

EXPR
    : CONTENT EOF { typeof console !== 'undefined' ? console.log($1) : print($1); return $1; }
    ;

CONTENT
    : BOLD CLOSE CONTENT BOLDEND            { $$ = "<b>"+$3+"</b>"; }
    | ITAL CLOSE CONTENT ITALEND            { $$ = "<i>"+$3+"</i>"; }
    | STRIKE CLOSE CONTENT STRIKEEND        { $$ = "<s>"+$3+"</s>"; }
    | URL CLOSE CONTENT URLEND              { $$ = "<a href=\""+$3+"\">"+$3+"</a>"; }
    | URLEQ NTOK CLOSE CONTENT URLEND       { $$ = "<a href=\""+$2+"\">"+$4+"</a>"; }
    | IMG CLOSE NTOK IMGEND                 { $$ = "<img src=\""+$3+"\"></img>"; }
    | QUOT CLOSE CONTENT QUOTEND            { $$ = "<blockquote>"+$3+"</b>"; }
    | CODE CLOSE CONTENT CODEEND            { $$ = "<pre>"+$3+"</b>"; }
    | STYLSIZ NTOK CLOSE CONTENT STYLEND    { $$ = "<span style=\"+$2+\">"+$4+"</b>"; }
    | STYLCOL NTOK CLOSE CONTENT STYLEND    { $$ = "<span style=\"color:"+$2+";\">"+$4+"</b>"; }
    | COL NTOK CLOSE CONTENT COLEND         { $$ = "<span style=\"color:"+$2+";\">"+$4+"</b>"; }
    | LIST CLOSE LITEMS LISTEND             { $$ = "<ul>"+$3+"</ul>"; }
    | TABLE CLOSE TDATA TABLEEND            { $$ = "<table>"+$3+"</table>"; }
    | NTOK
    ;

LITEMS
    : LITEM CONTENT LITEMS                  { $$ = "<li>"+$2+"</li>"; }
    ;

TDATA
    : TROW CLOSE CONTENT TROWEND            { $$ = "<tr>"+$3+"</tr>"; }
    | TDEL CLOSE CONTENT TDELEND            { $$ = "<td>"+$3+"</td>"; }
    ;

When I run that against the string:

"a [b]Test, log 1 [/b] This is a story about a [url=\"http://google.com\"]person[/url]"

I receive the error:

Error: Parse error on line 1:
a [b]Test, log 1 [/b]
--^
Expecting 'EOF', 'BOLDEND', 'ITALEND', 'STRIKEEND', 'URLEND', 'QUOTEND', 'CODEEND', 'STYLEND', 'COLEND', 'LITEM', 'TROWEND', 'TDELEND', got 'BOLD'

Any heads up to what's wrong is greatly appreciated.

Matt
  • 515
  • 3
  • 16

1 Answers1

1

Nothing in that grammar/lexer will accept ordinary text, except for NTOK (which is only alphabetic). So a is parsed as an NTOK, and the only production which allows NTOK needs to be followed by EOF. I presume that the parser is expecting a large number of things as well as EOF is the result of grammar compression.

rici
  • 234,347
  • 28
  • 237
  • 341