Based on your question I'm assuming that you're writing your own recursive descent parser for an indentation-sensitive language.
I've experimented with indentation-based languages before, and I solved the problem by having a state that keeps track of the current indentation level and two different terminals that match indentation. Both of them match indentation units (say two spaces or a tab) and count them. Let's call the matched indentation level matched_indentation
and the current indentation level expected_indentation
.
For the first one, let's call it indent
:
- if
matched_indentation < expected_indentation
, this is a dedent
, and the match is a failure.
- if
matched_indentation == expected_indentation
, the match is a success. The matcher consumes the indentation.
- if
matched_indentation > expected_indentation
, you have a syntax error (indentation out of nowhere) and should handle it as such (throw an exception or something).
For the second one, let's call it dedent
:
if matched_indentation < expected_indentation
, the match is successful. You reduce expected_indentation
by one, but you don't consume the input. This is so that you can chain multiple dedent
terminals to close multiple scopes.
if matched_indentation == expected_indentation
, the match is successful, and this time you do consume the input (this is the last dedent
terminal, all scopes are closed).
if matched_indentation > expected_indentation
, the match simply fails, you don't have a dedent
here.
Those terminals and non-terminals after which you expect an increase in indentation should increase expected_indentation
by one.
Let's say that you want to implement a python-like if statement (I'll use EBNF-like notation), it would look something like this:
indented_statement : indent statement newline;
if_statement : 'if' condition ':' newline indented_statement+ dedent ;
Now let's look at the following piece of code, and also assume that an if_statement
is a part of your statement
rule:
1|if cond1: <- expected_indentation = 0, matched_indentation = 0
2| if cond2: <- expected_indentation = 1, matched_indentation = 1
3| statement1 <- expected_indentation = 2, matched_indentation = 2
4| statement2 <- expected_indentation = 2, matched_indentation = 2
5| <- expected_indentation = 2, matched_indentation = 0
- On the first four lines you'll successfully match an
indent
terminal
- On the last line, you'll match two
dedent
terminals, closing both the scopes, and resulting with expected_indentation = 0
One thing you should be careful of is where you put your indent
and dedent
terminals. In this case, we don't need one in the if_statement
rule because it is a statement
, and indented_statement
already expects an indent.
Also mind how you treat newlines. One choice is to consume them as a sort of statement terminator, another is to have them precede the indentation, so choose whichever suits you best.