0

I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:

prop: "val"
+boolProp
-boolProp
prop2: val2

I have the key: value form working with this:

Expression ::= TestExpression
    | ParenExpression
    | OpExpression
    | ObjDefExpression
    | PropDefExpression
    | LiteralExpression
    | ReferenceExpression

PropDefExpression ::= Expression COLON Expression

ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*

// ... other expressions

But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?

kassak
  • 3,974
  • 1
  • 25
  • 36
Modus Operandi
  • 552
  • 1
  • 6
  • 24
  • What parser-generator are you using (and, if you care to comment, why did you chose that one instead of one which produces a bottom-up parser)? – rici Mar 02 '15 at 01:09
  • I'm working on a LiveScript plugin for IntelliJ IDEA IDE and as per their [tutorial](https://confluence.jetbrains.com/display/IntelliJIDEA/Grammar+and+Parser) I'm using [JFlex lexer](http://jflex.de/) + [GrammarKit parser plugin](https://github.com/JetBrains/Grammar-Kit). And I know way too little about parsers and IntelliJ plugin development in general to look for alternatives. – Modus Operandi Mar 02 '15 at 08:17
  • Fair enough. Apparently GrammarKit builds a recursive descent grammar, although its documentation is a bit disorganized so I might be missing something. I don't know a lot about livescript either, but it doesn't seem amenable to LL(1) parsing, and it's parser is built with Jison, which builds bottom-up LALR(1) parsers. (https://github.com/gkz/LiveScript/blob/master/src/grammar.ls). In short, I don't think I can help you much. Sorry. – rici Mar 02 '15 at 21:05

2 Answers2

0

The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:

Expression -> PropDefExpression -> Expression COLON Expression

And it's not just left-recursive, it's ambiguous. E.g.

Expression COLON Expression COLON Expression

can be derived in two different ways (roughly, left-associative vs right-associative).

You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:

PropDefExpression ::= Identifier COLON Expression

Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.

Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).

Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.

Michael Dyck
  • 2,153
  • 1
  • 14
  • 18
  • Grammar-Kit allows some (pseudo?)recursion through a sort of OO-like extension mechanism, so that wasn't exactly the problem. But I did solve it by rewriting the file with more limited/specific expressions so I'm accepting this as the answer. – Modus Operandi Mar 03 '15 at 13:51
0

I don't know how much help this will be, because I know nothing about GrammarKit and not much more about the language you're trying to parse.

However, it seems to me that

PropDefExpression ::= Expression COLON Expression

is not quite accurate, and it is creating an ambiguity when you add the boolean property production because an Expression might start with a unary - operator. In the actual grammar, though, a property cannot start with an arbitrary Expression. There are two types of key-property definitions:

name : expression
parenthesized_expression : expression

(Which is to say, expressions need to start with a ().

That means that a boolean property definition, starting with + or - is recognizable from the first token, which is precisely the condition needed for successful recursive descent parsing. There are several other property definition syntaxes, including names and parenthesized_expressions not followed by a :

That's easy to parse with an LR(1) parser, like the one Jison produces, but to parse it with a recursive-descent parser you need to left-factor. (It's possible that GrammarKit can do this for you, by the way.) Basically, you'd need something like (this is not complete):

PropertyDefinition ::= PropertyPrefix PropertySuffix? | BooleanProperty
PropertyPrefix ::= NAME | ParenthesizedExpression
PropertySuffix ::= COLON Expression | DOT NAME 
rici
  • 234,347
  • 28
  • 237
  • 341