I'm trying to write a regular expression engine. I'd like to write a recursive descent parser by hand. What would a context-free grammar without left recursion for the language of regular expressions (not the languages that can be described by regular expressions) look like? Would it be easiest to re-factor out the syntactic sugar, i.e. change a+
to aa*
? Thanks in advance!
Asked
Active
Viewed 3,519 times
7

Michael Myers
- 188,989
- 46
- 291
- 292

wkf
- 842
- 9
- 17
3 Answers
7
Left recursion:
Expression = Expression '|' Sequence
| Sequence
;
Sequence = Sequence Repetition
| <empty>
;
Right recursion:
Expression = Sequence '|' Expression
| Sequence
;
Sequence = Repetition Sequence
| <empty>
;
Ambiguous form:
Expression = Expression '|' Expression
| Sequence
;
Sequence = Sequence Sequence
| Repetition
| <empty>
;

Markus Jarderot
- 86,735
- 21
- 136
- 138
-
Right on man; you've answered all my questions this evening. Thanks! – wkf Jun 11 '09 at 02:47
2
You could look at the source code for Plan 9 grep. The file grep.y has a yacc (LALR(1) if I recall correctly) grammar for regular expressions. You might be able to start from the yacc grammar, and rewrite it for recursive descent parsing.
0
The wikipedia article on Left Recursion gives pretty good info on how to pull this off.

Mark P Neyer
- 1,009
- 2
- 8
- 19
-
It's not that I need to re-factor a grammar with left recursion, but rather that I'm trying to get a feel for what the grammar should look like in general. While I've read about them a lot, I've never actually used a context-free grammar 'in the wild' so to speak. – wkf Jun 10 '09 at 23:14