0

I have a list of tokens as generated by pyparsing. I need to carry out manipulations on individual tokens in list based on the tokens around them. Currently, I am just using a for loop. Is there any better mechanism for doing this?

For instance, a simple example is [1, "+", 2] into

<block s="reportSum">
    <l>1</l>
    <l>2</l>
</block>

Edit: I have been reading the pyparsing docs, and know about operatorPrecedence and setParseAction. I am ultimately trying to transform one language into another.

For instance, say("hi") into <block s="bubble"><l>Hello!</l></block>. I am currently parsing say("hi") into ["say", "hi"], and would like to know how to transform that into the XML I have above.

Daniel F
  • 340
  • 1
  • 3
  • 16
  • 1
    From your example, it looks like you need to parse the tokens into an abstract syntax tree first. – Colonel Thirty Two Dec 30 '15 at 17:56
  • From the sound of it, you might not even need the XML if you create an abstract syntax tree? – jpmc26 Dec 30 '15 at 17:58
  • @jpmc26 I need to give it is input to a program outside of my control. The AST outputted by pyparsing's `asXML` is not in the syntax I need. This question was about using the AST formed by pyparsing and turning it into XML in another syntax. For instance, that `+` must be changed to `reportSum` and must surround the `1` and `2`. – Daniel F Dec 30 '15 at 18:05
  • To parse and evaluate arithmetic operations, pyparsing includes a method for defining these parametrically, `infixNotation` (formerly called `operatorPrecedence`). A particularly thorough example is on the pyparsing wiki, http://pyparsing.wikispaces.com/file/view/eval_arith.py. To get your special XML output, change the behavior of the parse actions/classes attached to each operation level. – PaulMcG Dec 30 '15 at 18:08
  • You might consider `setParseAction`. It's a method on `pyarsing` grammar elements. You can use it to transform the results as `pyparsing` parses. – jpmc26 Dec 30 '15 at 18:10
  • @PaulMcGuire Thank you, but that isn't what I'm trying to do. I have been reading the pyparsing docs, and am using operatorPrecendence already. The addition is just a single basic example of what I am trying to do. I have one language, and am trying to transform it into another language. I have successfully used pyparsing to parse the first language into an AST. Now, I need to turn that into a second language. I read the basic calculator example, but I don't want to evaluate the code as I go. – Daniel F Dec 30 '15 at 18:14
  • @jpmc26 I am using setParseAction already to transform number tokens from strings into floats. However, it doesn't appear to me that that is what I want. setParseAction appears to be stateless, and I need to transform based on the tokens around a token. – Daniel F Dec 30 '15 at 18:15
  • 1
    You can set the parse action on grammar elements higher than just a token. For example, assuming that `float` is a token that matches floating point literals, this is a grammar element: `(float + '+' + float)`, and you should be able to do something like this on it: `(float + '+' + float).setParseAction(lambda x: ('sum', x[0], x[2]))`. (That might not be exactly right, but hopefully, it gets the idea across.) You can set the parse action to accept the list of tokens and return something more suitable for the syntax tree. – jpmc26 Dec 30 '15 at 18:37
  • @jpmc26 That feels so obvious. I should have thought of that. If you put it in an answer, I will mark it as accepted and upvote it. – Daniel F Dec 30 '15 at 18:40
  • 1
    I don't feel like I have it flushed out enough to make it into an answer, and unfortunately, I can't do so right now. However, if you flush it out and post, I'll glad upvote it. =) Plus my terminology on grammars might be off. – jpmc26 Dec 30 '15 at 18:42
  • My answer below rambles on and on, the basic concept is exactly as given by @jpmc26 above. If he posts, please accept his as the actual answer. – PaulMcG Dec 30 '15 at 18:49

1 Answers1

2

In infixNotation (aka operatorPrecedence), you can attach parse actions to each subexpression found. See below:

from pyparsing import *

opfunc = {
    '+': 'reportSum',
    '-': 'reportDifference',
    '*': 'reportProduct',
    '/': 'reportDivision',
    }
def makeXML(a, op, b):
    #~ print a,op,b
    return '<block s="%s"><l>%s</l><l>%s</l></block>' % (opfunc[op], a, b)

def outputBinary(tokens):
    t = tokens[0].asList()
    ret = makeXML(t.pop(0), t.pop(0), t.pop(0))
    while t:
        ret = makeXML(ret, t.pop(0), t.pop(0))
    return ret



integer = Word(nums)
# expand this to include other supported operands, like floats, variables, etc.
operand = integer

arithExpr = infixNotation(operand, 
    [
    (oneOf('* /'), 2, opAssoc.LEFT, outputBinary),
    (oneOf('+ -'), 2, opAssoc.LEFT, outputBinary),
    ])

tests = """\
    1+2
    1+2*5
    1+2*6/3
    1/4+3*4/2""".splitlines()

for t in tests:
    t = t.strip()
    print t
    print arithExpr.parseString(t)[0]
    print

giving:

1+2
<block s="reportSum"><l>1</l><l>2</l></block>

1+2*5
<block s="reportSum"><l>1</l><l><block s="reportProduct"><l>2</l><l>5</l></block></l></block>

1+2*6/3
<block s="reportSum"><l>1</l><l><block s="reportDivision"><l><block s="reportProduct"><l>2</l><l>6</l></block></l><l>3</l></block></l></block>

1/4+3*4/2
<block s="reportSum"><l><block s="reportDivision"><l>1</l><l>4</l></block></l><l><block s="reportDivision"><l><block s="reportProduct"><l>3</l><l>4</l></block></l><l>2</l></block></l></block>

Note that parsing '1+2+3' will not give the traditional [['1','+','2'],'+','3'] nested list, but the run-on sequence ['1','+','2','+','3'], which is why outputBinary has to iterate over the list beyond just the first 3 elements.

As for your say("hi") example, something like the following should help:

LPAR,RPAR = map(Suppress,"()")
say_command = Keyword("say")('cmd') + LPAR + delimitedList(QuotedString('"'))('args') + RPAR
ask_command = Keyword("ask")('cmd') + LPAR + delimitedList(QuotedString('"'))('args') + RPAR
cmd_func = {
    'say': 'bubble',
    'ask': 'prompt',
    }
def emitAsXML(tokens):
    func = cmd_func[tokens.cmd]
    args = ''.join('<l>%s</l>' % arg for arg in tokens.args)
    return """<block s="%s">%s</block>""" % (func, args)
cmd = (say_command | ask_command).setParseAction(emitAsXML)

tests = """\
    say("hi")
    say("hi","handsome")
    ask("what is your name?")""".splitlines()

for t in tests:
    t = t.strip()
    print t
    print cmd.parseString(t)[0]
    print

giving:

say("hi")
<block s="bubble"><l>hi</l></block>

say("hi","handsome")
<block s="bubble"><l>hi</l><l>handsome</l></block>

ask("what is your name?")
<block s="prompt"><l>what is your name?</l></block>

If you need a wider context to create some output, then just attach the parse action to the higher-level expression in your parser.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130