9

How can I describe a grammar using regex (or pyparsing is better?) for a script languge presented below (Backus–Naur Form):

<root>   :=     <tree> | <leaves>
<tree>   :=     <group> [* <group>] 
<group>  :=     "{" <leaves> "}" | <leaf>;
<leaves> :=     {<leaf>;} leaf
<leaf>   :=     <name> = <expression>{;}

<name>          := <string_without_spaces_and_tabs>
<expression>    := <string_without_spaces_and_tabs>

Example of the script:

{
 stage = 3;
 some.param1 = [10, 20];
} *
{
 stage = 4;
 param3 = [100,150,200,250,300]
} *
 endparam = [0, 1]

I use python re.compile and want to divide everything in groups, something like this:

[ [ 'stage',       '3'],
  [ 'some.param1', '[10, 20]'] ],

[ ['stage',  '4'],
  ['param3', '[100,150,200,250,300]'] ],

[ ['endparam', '[0, 1]'] ]

Updated: I've found out that pyparsing is much better solution instead of regex.

Max Tkachenko
  • 792
  • 1
  • 12
  • 30
  • You can just reduce the grammar by substituting all non-terminal nodes to obtain a regex. However, what good would this do? The result would simply return a token stream for your input string, but would not preserve any structure, which is required to make semantic sense of the code. – Konrad Rudolph Jan 13 '15 at 16:34
  • 1
    There is an ABNF-to-regex converter here: http://www.a-k-r.org/abnf/ – Anderson Green Jan 13 '15 at 16:35
  • Also see this related Stack Overflow question: http://stackoverflow.com/questions/8898049/how-to-convert-a-regular-grammar-to-regular-expression – Anderson Green Jan 13 '15 at 16:38
  • @AndersonGreen, ABNF-to-regex is very useful! thank you! – Max Tkachenko Jan 13 '15 at 16:43
  • @KonradRudolph, yes, you are right and I'm thinking in pyparsing direction :) – Max Tkachenko Jan 13 '15 at 16:45

1 Answers1

9

Pyparsing lets you simplify some of these kinds of constructs

leaves :: {leaf} leaf

to just

OneOrMore(leaf)

So one form of your BNF in pyparsing will look something like:

from pyparsing import *

LBRACE,RBRACE,EQ,SEMI = map(Suppress, "{}=;")
name = Word(printables, excludeChars="{}=;")
expr = Word(printables, excludeChars="{}=;") | quotedString

leaf = Group(name + EQ + expr + SEMI)
group = Group(LBRACE + ZeroOrMore(leaf) + RBRACE) | leaf
tree = OneOrMore(group)

I added quotedString as an alternative expr, in case you wanted to have something that did include one of the excluded chars. And adding Group around leaf and group will maintain the bracing structure.

Unfortunately, your sample doesn't quite conform to this BNF:

  1. spaces in [10, 20] and [0, 1] make them invalid exprs

  2. some leafs do not have terminating ;s

  3. lone * characters - ???

This sample does parse successfully with the above parser:

sample = """
{
 stage = 3;
 some.param1 = [10,20];
}
{
 stage = 4;
 param3 = [100,150,200,250,300];
}
 endparam = [0,1];
 """

parsed = tree.parseString(sample)    
parsed.pprint()

Giving:

[[['stage', '3'], ['some.param1', '[10,20]']],
 [['stage', '4'], ['param3', '[100,150,200,250,300]']],
 ['endparam', '[0,1]']]
PaulMcG
  • 62,419
  • 16
  • 94
  • 130