1

I am trying to use Jison, which is a JS port of Bison, the parser generator. My goal is to convert this input:

foo(10)
bar()
foo(28)
baz(28)

into this:

[
  { func: 'foo', arg: 10 },
  { func: 'bar' },
  { func: 'foo', arg: 28 },
  { func: 'baz', arg: 28 }
]

Here is my bison file:

%lex

%%
[0-9]+\b                  return 'INTEGER'
\(                        return 'OPEN_PAREN'
\)                        return 'CLOSE_PAREN'
[\w]+\s*(?=\()            return 'FUNC_NAME'
\n+                       return 'LINE_END'

/lex

%%
expressions
  : expressions expression
  | expression
  ;

expression
  : LINE_END
  | e LINE_END
    {return $1}
  ;

e
  : FUNC_NAME OPEN_PAREN INTEGER CLOSE_PAREN
    {$$ = { func: $1, arg: $3 };}

  | FUNC_NAME OPEN_PAREN CLOSE_PAREN
    {$$ = { func: $1 };}
  ;

The output of the resulting generated parser is { func: 'foo', arg: 10 }. In other words, it only returns the parsed object from the first statement and ignores the rest.

I know my problem has to do with semantic value and the "right side" of expression, but I am pretty lost otherwise.

Any help would be extremely appreciated!

rici
  • 234,347
  • 28
  • 237
  • 341
AndyPerlitch
  • 4,539
  • 5
  • 28
  • 43

1 Answers1

3

I'm appending a grammar that does what you asked for. The salient changes are:

  1. LINE_END has the regex \n+|$ to also match the end of output.

  2. I've added a start production whose role is only to return the final result.

  3. Rewrote the expression production to produce arrays. I've also removed the {return $1} from the e LINE_END rule since that caused the parser to return prematurely.

  4. Modified the expressions production to concatenate the arrays.

For the expression and expressions productions, I've used the shorthand syntax for the rules there. For instance expression -> [$1] is equivalent to expression { $$ = [$1] }.

Here is the grammar:

%lex

%%
[0-9]+\b                  return 'INTEGER'
\(                        return 'OPEN_PAREN'
\)                        return 'CLOSE_PAREN'
[\w]+\s*(?=\()            return 'FUNC_NAME'
\n+|$                     return 'LINE_END'

/lex

%%
start:
  expressions
  { return $1 }
  ;

expressions
  : expressions expression -> $1.concat($2)
  | expression
  ;

expression
  : LINE_END -> []
  | e LINE_END -> [$1]
  ;

e 
  : FUNC_NAME OPEN_PAREN INTEGER CLOSE_PAREN
    {$$ = { func: $1, arg: $3 };}

  | FUNC_NAME OPEN_PAREN CLOSE_PAREN
    {$$ = { func: $1 };}
  ;

An aside: Jison is not a port of Bison. It is a a parser generator whose functioning is strongly inspired by Bison but it has features that Bison does not have and there are some features of Bison that Jison does not support.

Louis
  • 146,715
  • 28
  • 274
  • 320
  • Thanks for the answer! I am getting an error when `\n+|$` is `LINE_END`, but when I leave it as `\n`, I get a list of objects as expected but with `'\n'`s for each blank line. If I make it `\n+`, It removes the `'\n'`s in the middle of the list but not at the beginning. – AndyPerlitch Feb 05 '18 at 03:50
  • I've tried the grammar both with the [online Jison](https://zaa.ch/jison/try/) and by installing the latest Jison (0.4.18) locally. I cannot reproduce the issue you describe with getting `\n` "for each blank line". Try as I may, the array produced never has anything between expressions. However, I was able to produce a file that caused an empty string to show up at the end of in the array if the parsed text ended with a newline instead of just ending. I've edited my answer to prevent that. – Louis Feb 05 '18 at 11:32