6

I'm using pyparsing to parse an expression of the form:

"and(or(eq(x,1), eq(x,2)), eq(y,3))"

My test code looks like this:

from pyparsing import Word, alphanums, Literal, Forward, Suppress, ZeroOrMore, CaselessLiteral, Group

field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
ne_ = CaselessLiteral('ne') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
function = ( eq_ | ne_ )

arg = Forward()
and_ = Forward()
or_ = Forward()

arg << (and_ | or_ |  function) + Suppress(",") + (and_ | or_ | function) + ZeroOrMore(Suppress(",") + (and_ | function))

and_ << Literal("and") + Suppress("(") + Group(arg) + Suppress(")")
or_ << Literal("or") + Suppress("(") + Group(arg) + Suppress(")")

exp = (and_ | or_ | function)

print(exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))"))

I have output in form:

['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]

List output looks OK. But for subsequent processing I'd like to have output in form of a nested dictionary:

{
    name: 'and',
    args: [
        {
            name: 'or',
            args: [
                {
                    name: 'eq',
                    args: ['x','1']
                },
                {
                    name: 'eq',
                    args: ['x','2']
                }
            ]
        },
        {
            name: 'eq',
            args: ['y','3']
        }
    ]
}

I have tried Dict class but without success.

Is it possible to do it in pyparsing? Or should I manually format list output?

Horned Owl
  • 139
  • 2
  • 9

2 Answers2

11

The feature you are looking for is an important one in pyparsing, that of setting results names. Using results names is recommended practice for most pyparsing applications. This feature has been there since version 0.9, as

expr.setResultsName("abc")

This allows me to access this particular field of the overall parsed results as res["abc"] or res.abc (where res is the value returned from parser.parseString). You can also call res.dump() to see a nested view of your results.

But still mindful of keeping parsers easy to follow at-a-glance, I added support for this form of setResultsName back in 1.4.6:

expr("abc")

Here is your parser with a little cleanup, and results names added:

COMMA,LPAR,RPAR = map(Suppress,",()")
field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
ne_ = CaselessLiteral('ne')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
function = ( eq_ | ne_ )

arg = Forward()
and_ = Forward()
or_ = Forward()
exp = Group(and_ | or_ | function)

arg << delimitedList(exp)

and_ << Literal("and")("name") + LPAR + Group(arg)("args") + RPAR
or_ << Literal("or")("name") + LPAR + Group(arg)("args") + RPAR

Unfortunately, dump() only handles nesting of results, not lists of values, so it is not quite as nice as json.dumps (maybe this would be a good enhancement to dump?). So here is a custom method to dump out your nested name-args results:

ob = exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))")[0]

INDENT_SPACES = '    '
def dumpExpr(ob, level=0):
    indent = level * INDENT_SPACES
    print (indent + '{')
    print ("%s%s: %r," % (indent+INDENT_SPACES, 'name', ob['name']))
    if ob.name in ('eq','ne'):
        print ("%s%s: %s"   % (indent+INDENT_SPACES, 'args', ob.args.asList()))
    else:
        print ("%s%s: ["   % (indent+INDENT_SPACES, 'args'))
        for arg in ob.args:
            dumpExpr(arg, level+2)
        print ("%s]"   % (indent+INDENT_SPACES))
    print (indent + '}' + (',' if level > 0 else ''))
dumpExpr(ob)

Giving:

{
    name: 'and',
    args: [
        {
            name: 'or',
            args: [
                {
                    name: 'eq',
                    args: ['x', '1']
                },
                {
                    name: 'eq',
                    args: ['x', '2']
                },
            ]
        },
        {
            name: 'eq',
            args: ['y', '3']
        },
    ]
}
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • Yes, this is exactly what I need. And thank you for cleaning my code. I am a newbie in pyparsing. – Horned Owl Aug 11 '14 at 14:34
  • 2
    Well, sometimes you get an itch, it just has to be scratched. Based on the work on this question, I have enhanced the dump() method in pyparsing's ParseResults class list out array values of unnamed nested results. It is in the latest code checked into SVN, and will be in release 2.0.3. – PaulMcG Aug 12 '14 at 15:05
2

I don't think pyparsing has something like that, but you can recursively create the data structures:

def toDict(lst):
    if not isinstance(lst[1], list):
        return lst
    return [{'name': name, 'args': toDict(args)}
            for name, args in zip(lst[::2], lst[1::2])]

Your example behave differently on the number of args children. If it's only one you just use a dict, otherwise it's a list of dicts. That will lead to a complicated use. It's better to use a list of dicts even when there is a single child. This way you always know how to iterate the children without type-checking.

Example

We can use json.dumps to pretty print the output (note that here we print parsedict[0] because we know that the root has a single child, but we always return lists as specified before):

import json
parsed = ['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]
parsedict = toDict(parsed)
print json.dumps(parsedict[0], indent=4, separators=(',', ': '))

Output

{
    "name": "and",
    "args": [
        {
            "name": "or",
            "args": [
                {
                    "name": "eq",
                    "args": [
                        "x",
                        "1"
                    ]
                },
                {
                    "name": "eq",
                    "args": [
                        "x",
                        "2"
                    ]
                }
            ]
        },
        {
            "name": "eq",
            "args": [
                "y",
                "3"
            ]
        }
    ]
}

To obtain that output I replaced the dict with a collections.OrderedDict in the toDict functin, just to keep the name before args.

enrico.bacis
  • 30,497
  • 10
  • 86
  • 115
  • output is '{'args': [{'args': [{'args': [['x', '1'], 'eq', ['x', '2']], 'name': 'eq'}, 'eq', ['y', '3']], 'name': 'or'}], 'name': 'and'} ' Structure for ['x', '1'] and ['x', '2'] is not right. – Stephen Lin Aug 11 '14 at 08:45
  • Good idea - it is not uncommon to have pyparsing just do the structuring of the parsed data into a hierarchy, and then follow up with a conversion pass into your specific data structure, if results names and parse actions are inadequate (such as when the desired output data structure represents some summarization or accumulation of data, which is difficult across parsed sub-elements). In this case, results names and adding Groups for structure are sufficient. – PaulMcG Aug 11 '14 at 15:13