0

So, I am trying to use PegJS to define a parser for a simple language.

The language consists purely of infinitely deep function calls, which are separated by commas such as:

f(4, g()) => [f, [4, g, []]]

g()
f(5) => [g, [], f, [5]]

This is the grammar i have:

call = 
  func"("arg")"

func =
  [a-zA-Z]+

arg =
  [0-9a-z,A-Z]+ / call


_ "whitespace"
  = [ \t\n\r]*

Yet its's not recursing:

input: b(r(6))

error: Line 1, column 4: Expected ")" or [0-9a-z,A-Z] but "(" found.

I get the idea of left vs right recursion but im not getting how to make it recurse the call rule infinitely.

Josh Weinstein
  • 2,788
  • 2
  • 21
  • 38

1 Answers1

2

I think the problem is in your grammar ambiguity. Expanding a little towards GNF (leading terminal), we get two rule chains for an alphabetic symbol:

arg = [0-9a-z,A-Z]+ arg = call # Expand call = func"("arg")" # Expand func = [a-zA-Z]+"("arg")"

Thus, an alphabetic identifier can resolve to either an arg or the func of a call. Your resulting parser apparently chose to reduce g to another arg, rather than to the first part of a func.

I'm not familiar with PegJS, so I can't suggest how to coerce your parser into submission. You do need a 1-token lookahead to resolve this.

However, I do know about parsers in general. Many regular expression engines are "greedy": they'll grab the longest matching string. If you have one of these, the critical problem is that

arg = [0-9a-z,A-Z]+

will consume the span "4, g" before it returns to any other processing, thus cutting out the possibility of finding "g()" as a second argument. In this case, what you need is a grammar that finds individual arguments, and is greedy about each one. Use the comma as a separator, and put them together into an arg_list (a new non-token):

arg_list = arg \
           arg "," arg_list

call = func "(" arg_list ")" \
       func "()"

This is one canonical way to parse a function call.

Prune
  • 76,765
  • 14
  • 60
  • 81