Recursive Descent Parser with Indentation and Backtracking

Question

I have been trying to find an Recursive Descent Parser Algorithm that is also suited for indentation with backtracking. But I keep having myself finding troublesome solutions for this.

Are there any resources out there that also deal with indentation?

Thanks

score 5 · Accepted Answer · answered Nov 24 '13 at 15:47

Based on your question I'm assuming that you're writing your own recursive descent parser for an indentation-sensitive language.

I've experimented with indentation-based languages before, and I solved the problem by having a state that keeps track of the current indentation level and two different terminals that match indentation. Both of them match indentation units (say two spaces or a tab) and count them. Let's call the matched indentation level matched_indentation and the current indentation level expected_indentation.

For the first one, let's call it indent:

if matched_indentation < expected_indentation, this is a dedent, and the match is a failure.
if matched_indentation == expected_indentation, the match is a success. The matcher consumes the indentation.
if matched_indentation > expected_indentation, you have a syntax error (indentation out of nowhere) and should handle it as such (throw an exception or something).

For the second one, let's call it dedent:

if matched_indentation < expected_indentation, the match is successful. You reduce expected_indentation by one, but you don't consume the input. This is so that you can chain multiple dedent terminals to close multiple scopes.
if matched_indentation == expected_indentation, the match is successful, and this time you do consume the input (this is the last dedent terminal, all scopes are closed).

if matched_indentation > expected_indentation, the match simply fails, you don't have a dedent here.

Those terminals and non-terminals after which you expect an increase in indentation should increase expected_indentation by one.

Let's say that you want to implement a python-like if statement (I'll use EBNF-like notation), it would look something like this:

indented_statement : indent statement newline;    

if_statement : 'if' condition ':' newline indented_statement+ dedent ;

Now let's look at the following piece of code, and also assume that an if_statement is a part of your statement rule:

1|if cond1:                 <- expected_indentation = 0, matched_indentation = 0
2|  if cond2:               <- expected_indentation = 1, matched_indentation = 1
3|    statement1            <- expected_indentation = 2, matched_indentation = 2
4|    statement2            <- expected_indentation = 2, matched_indentation = 2
5|                          <- expected_indentation = 2, matched_indentation = 0

On the first four lines you'll successfully match an indent terminal
On the last line, you'll match two dedent terminals, closing both the scopes, and resulting with expected_indentation = 0

One thing you should be careful of is where you put your indent and dedent terminals. In this case, we don't need one in the if_statement rule because it is a statement, and indented_statement already expects an indent.

Also mind how you treat newlines. One choice is to consume them as a sort of statement terminator, another is to have them precede the indentation, so choose whichever suits you best.

Recursive Descent Parser with Indentation and Backtracking

1 Answers1