Parse rules CF-grammar using Regular Expressions (how using templates)

Question

I have CF-grammar. It rules is as follows:

S->a|AS

A->AB|a|b

B->b

I want to parse these rules using Regular Expressions.

My Regular Expression:

\b([A-Z])->(?:([A-Za-z]+)\|?)+

For: "A->AB|a|b" result:

0: A->AB|a|b

1: A

2: b

but I whant this:

0: A->AB|a|b

1: A

2: AB

3: a

4: b

What language/tool are you using? – Qtax Dec 02 '11 at 12:36 — Qtax, Dec 02 '11 at 12:36
I would use "[A-Za-z]+", but that's not what I want – couatl Dec 02 '11 at 13:02 — couatl, Dec 02 '11 at 13:02

score 0 · Answer 1 · answered Dec 02 '11 at 12:22

Regular Expressions aren't sufficiently powerful for the task, but are used for instance in EBFN to enhance expressiveness of the grammar. You could consider a topdown parser (shaped via recursive calls) to parse your input. That's easy to implement in all languages that allows mutually recursive calls. It requires a grammar with some restrictions (see Wikipedia about this, if you are interested). At first glance your grammar should be LL(1), i.e. requires 1 token lookahead.

score 0 · Answer 2 · answered Dec 02 '11 at 12:36

0

You could just split every rule with ->|\| to get the desired list.

answered Dec 02 '11 at 12:36

Qtax

33,241
9
83
121

I'm not fluent in C++/boost, but I can show you a Perl example: `say join "\n", split /->|\|/, "A->AB|a|b";`. Related boost links http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/regex_split.html and http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/regex_token_iterator.html – Qtax Dec 02 '11 at 13:06
Use "->" and "|" as separators in Split? – couatl Dec 02 '11 at 13:10
Yes, `->|\|` is a regex (not the formal kind) that matches either a `->` or `|`. – Qtax Dec 02 '11 at 13:23
If so, it does not fit. Because it will work as regex "[A-Za-z]+". I want, that a regular expression sets whether it's a rule. And divided into parts. – couatl Dec 02 '11 at 13:32

Parse rules CF-grammar using Regular Expressions (how using templates)

2 Answers2