Context-free grammar describing regular expressions?

Question

I'm trying to write a regular expression engine. I'd like to write a recursive descent parser by hand. What would a context-free grammar without left recursion for the language of regular expressions (not the languages that can be described by regular expressions) look like? Would it be easiest to re-factor out the syntactic sugar, i.e. change a+ to aa* ? Thanks in advance!

score 7 · Accepted Answer · answered Jun 11 '09 at 02:23

Left recursion:

Expression = Expression '|' Sequence
           | Sequence
           ;

Sequence = Sequence Repetition
         | <empty>
         ;

Right recursion:

Expression = Sequence '|' Expression
           | Sequence
           ;

Sequence = Repetition Sequence
         | <empty>
         ;

Ambiguous form:

Expression = Expression '|' Expression
           | Sequence
           ;

Sequence = Sequence Sequence
         | Repetition
         | <empty>
         ;

Right on man; you've answered all my questions this evening. Thanks! — wkf, Jun 11 '09 at 02:47

score 2 · Answer 2 · answered Jan 03 '11 at 18:39

You could look at the source code for Plan 9 grep. The file grep.y has a yacc (LALR(1) if I recall correctly) grammar for regular expressions. You might be able to start from the yacc grammar, and rewrite it for recursive descent parsing.

score 0 · Answer 3 · answered Jun 10 '09 at 20:55

0

The wikipedia article on Left Recursion gives pretty good info on how to pull this off.

answered Jun 10 '09 at 20:55

Mark P Neyer

1,009
2
8
19

It's not that I need to re-factor a grammar with left recursion, but rather that I'm trying to get a feel for what the grammar should look like in general. While I've read about them a lot, I've never actually used a context-free grammar 'in the wild' so to speak. – wkf Jun 10 '09 at 23:14

Context-free grammar describing regular expressions?

3 Answers3