0

I have CF-grammar. It rules is as follows:

S->a|AS

A->AB|a|b

B->b

I want to parse these rules using Regular Expressions.

My Regular Expression:

\b([A-Z])->(?:([A-Za-z]+)\|?)+

For: "A->AB|a|b" result:

0: A->AB|a|b

1: A

2: b

but I whant this:

0: A->AB|a|b

1: A

2: AB

3: a

4: b

Qtax
  • 33,241
  • 9
  • 83
  • 121
couatl
  • 416
  • 4
  • 14

2 Answers2

0

Regular Expressions aren't sufficiently powerful for the task, but are used for instance in EBFN to enhance expressiveness of the grammar. You could consider a topdown parser (shaped via recursive calls) to parse your input. That's easy to implement in all languages that allows mutually recursive calls. It requires a grammar with some restrictions (see Wikipedia about this, if you are interested). At first glance your grammar should be LL(1), i.e. requires 1 token lookahead.

CapelliC
  • 59,646
  • 5
  • 47
  • 90
0

You could just split every rule with ->|\| to get the desired list.

Qtax
  • 33,241
  • 9
  • 83
  • 121
  • I'm not fluent in C++/boost, but I can show you a Perl example: `say join "\n", split /->|\|/, "A->AB|a|b";`. Related boost links http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/regex_split.html and http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/regex_token_iterator.html – Qtax Dec 02 '11 at 13:06
  • Use "->" and "|" as separators in Split? – couatl Dec 02 '11 at 13:10
  • Yes, `->|\|` is a regex (not the formal kind) that matches either a `->` or `|`. – Qtax Dec 02 '11 at 13:23
  • If so, it does not fit. Because it will work as regex "[A-Za-z]+". I want, that a regular expression sets whether it's a rule. And divided into parts. – couatl Dec 02 '11 at 13:32