2

I understand that an LL parser (no lookahead) cannot deal with left recursive rules due to the fact it will keep on predicting the left recursive non terminal over and over again and would never be able to match.

But what if we adopted the combination of 2 strategies:

  1. We build a table where we assign each non terminal the minimum length of the strings its sublanguage generates.

So, suppose we have a left recursive rule as follows:

A -> As | a

Where s is a string composed of terminals and non terminals.

If either s generates strings with minimum length greater than 0, then even though when faced with the non terminal "A" in a prediction we would be predicting "A s" over and over again, the minimum length of the predicted string would only grow, eventually surpassing the length of the input string and therefore being discarded.

Of course this wouldn't work with grammars that are infinitely ambiguous (like the ones that have loops), leading us to the second strategy:

  1. Each time we make a prediction either the minimum length of the whole sentential form grows or it stays the same, in the case where it stays the same we keep a stack of the leftmost predicted non terminals, and whenever we predict the same non terminal twice, we discard the prediction.

This effectively discards infinitely many derivations, though it preserves the language accepted by the parser.


As a concrete example, suppose the following grammar:

S -> Sa | a

The minimum length of the strings generated by the non terminal S is 1.

Now suppose we're parsing the following input string: aa.

  1. We start by predicting the start symbol, "S".
  2. Using breadth-first search, we then have 2 predictions: "Sa" and "a".
  3. The "a" prediction is discarded since even though it matches the first "a" in the input string it reaches EOL while there is still an "a" left in the input string.
  4. The "Sa" prediction cannot match anything since we have "S" as the "matching token" in the sentential form, thus we make other 2 predictions: "Saa" and "aa".
  5. The "aa" prediction matches the entire input string and thus we have found a parsing.
  6. The "Saa" prediction has minimum length 3 and therefore cannot match the input string and is discarded.

Would this work? Am I missing something?

  • At step 3, you seem to be using an arbitrary lookahead. You have to predict either `Sa` or `a` when you see the `a`, and at that point you have no idea whether the end of input is close or not . – rici Jul 25 '20 at 21:43
  • For what it's worth, I've had parallel thoughts to yours and I do in fact have a proof of concept lying around where I parse an arbitrarily parenthesized expression using a top-down, recursive descent parser based on a left-recursive grammar. My parser system (even prior to this) tokenizes the entire input stream before the parsing begins, so the parser knows the total number of lexemes in the input and uses this fact to limit the amount of speculative recursion. Needless to say, my parser is a back-tracking one, which allows it recover from too many recursive calls. – 500 - Internal Server Error Jul 27 '20 at 10:24
  • Now, I don't know how well (or not) this would work on realistic inputs., and I've been meaning to expand this to also look at possible terminating terminals so that each self-recursive production scans ahead for the list of tokens that _must_ be there in order to terminate it properly and sets that as a limit for how many (or few) tokens the production can consume, but I haven't had the time to explore this idea further yet. – 500 - Internal Server Error Jul 27 '20 at 10:26
  • 1
    @500-InternalServerError: That seems like an awful lot of work to do just to avoid using an LALR parser, which can handle left-recursion flawlessly in linear time without any additional lookahead. Or, you know, if an algorithm from 1982 is too new-fangled for you, you could augment your LL parser with an operator precedence parser for expressions, yielding a 1970 state-of-the-art left corner parser. :-) That would be roughly speaking the first parser I wrote (in APL). – rici Jul 28 '20 at 21:25
  • @rici: :) - I am aware that bottom-up parsing technologies exist that don't suffer from the issue being discussed here - I'm sure the OP is aware of them too. Personally, however, I find it easier to understand and reason about top-down parsers especially in the grammar/parser development stage, so I still think it's a worthwhile research area, even if much of the world has moved on to LALR or similar techniques. – 500 - Internal Server Error Jul 29 '20 at 08:15
  • @rici At step 3 I meant the end of the predicted sentential form rather than the end of the input string. Usually this is done using an end marker, like "#" for instance, but I chose to omit it here. – Henrique Inonhe Jul 29 '20 at 14:00
  • @500-InternalServerError Exactly. I'm well aware of LALR parsers and many other variants, my interest in this case is not necessarily a practical one, but rather more of a theoretical speculation, mainly because it is widespread the idea that top down directional parsers just can't handle left recursion, but just maybe, a slight adjustment could be made to it so that it could actually handle them. Maybe the tradeoffs are just not worth it, given that ANTLR (which is a state of the art LL(*) parser) doesn't deal with them. – Henrique Inonhe Jul 29 '20 at 14:05
  • What really grabbed my attention is the fact that "Parsing Techniques" don't mention the possibility of tweaking LL parsers to deal with left recursion and instead rely on adapting the grammar. – Henrique Inonhe Jul 29 '20 at 14:07
  • @henrique: I don't think that affects my comment. Your predictions are `Sa` and `a`; your lookahead is `a`. The EOI is not yet visible. In order to condition the choice of prediction on the presence or absence of the EOI, you need another token of lookahead. But `a` could be some other non-terminal with a potentially unlimited expansion: `S -> expression | S expression`. – rici Jul 29 '20 at 14:12
  • Of course, if you allow arbitrary lookahead, you can parse the input. But then you no longer have an LL parser, not even a tweaked LL parser. You're well on your way to creating an LR parser :-) BTW, LL and LR are often thought of as some kind of polar opposites, but they're actually duals. They have a lot of common, most importantly that they both work left-to-right. So really both are making predictions. The difference is that the LR parser can handle more than one simultaneous prediction. – rici Jul 29 '20 at 14:18

0 Answers0