Read Bunch() from string

Question

I have the following string in a report file:

"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"

I would like to turn it into a Bunch() object or a dict, so that I can access the information inside (via either my_var.conditions or my_var["conditions"]).

This works very well with eval():

eval("Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])")

however I would like to avoid using that.

I have tried to write a couple of string substitutions so that I convert it to a dict syntax and then parse it with json.loads() but that looks very very hackish, and will break as soon as I encounter any new fields in future strings; e.g.:

"{"+"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"[1:-1]+"}".replace("conditions=","'conditions':")

You get the idea.

Do you know if there is any better way to parse this?

What exactly is your final expected output? Also, can you show what you have done so far, to get an idea of what your approach is like? — idjaw, Jun 07 '16 at 01:34

score 2 · Answer 1 · answered Jun 07 '16 at 15:45

Here is my ugly piece of code, please check:

import re
import json
l = "Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"

exec('{}="{}"'.format(l[:5],l[6:-1]))
sb = re.split("=| [a-zA-Z]", Bunch)
temp = ['"{}"'.format(x) if x.isalpha() else x for x in sb ]
temp2 = ','.join(temp)
temp3 = temp2.replace('",[', '":[')
temp4 = temp3.replace(',,', ',')
temp5 = temp4.replace("\'", '"')
temp6 = """{%s}""" %(temp5)
rslt = json.loads(temp6)

Eventually, the output:

rslt
Out[12]: 
{'urations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
 'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'],
 'nsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]}

rslt["conditions"]
Out[13]: ['s1', 's2', 's3', 's4', 's5', 's6']

Generally, I think re is the package you need, but due to my limited experience of using it, I could apply it well here. Hope someone else will give a more elegant solution.

FYI, you said you could easily use eval to get what you want, but when I try to use it, I got TypeError: 'str' object is not callable. which Python version are you using? (I tried it on Python27 and Python33, both of them cannot work)

PaulMcG · Accepted Answer · 2016-06-07T17:12:40.603

This pyparsing code will define a parsing expression for your Bunch declaration.

from pyparsing import (pyparsing_common, Suppress, Keyword, Forward, quotedString, 
    Group, delimitedList, Dict, removeQuotes, ParseResults)

# define pyparsing parser for the Bunch declaration
LBRACK,RBRACK,LPAR,RPAR,EQ = map(Suppress, "[]()=")
integer = pyparsing_common.integer
real = pyparsing_common.real
ident = pyparsing_common.identifier

# define a recursive expression for nested lists
listExpr = Forward()
listItem = real | integer | quotedString.setParseAction(removeQuotes) | Group(listExpr)
listExpr << LBRACK + delimitedList(listItem) + RBRACK

# define an expression for the Bunch declaration
BUNCH = Keyword("Bunch")
arg_defn = Group(ident + EQ + listItem)
bunch_decl = BUNCH + LPAR + Dict(delimitedList(arg_defn))("args") + RPAR

Here is that parser run against your sample input:

# run the sample input as a test
sample = """Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'],
                  durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
                  onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"""
bb = bunch_decl.parseString(sample)
# print the parsed output as-is
print(bb)

Gives:

['Bunch', [['conditions', ['s1', 's2', 's3', 's4', 's5', 's6']], 
    ['durations', [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]]], 
    ['onsets', [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]]]]

With pyparsing, you can also add a parse-time callback, so that pyparsing will do the tokens->Bunch conversion for you:

# define a simple placeholder class for Bunch
class Bunch(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return "Bunch:(%s)" % ', '.join("%r: %s" % item for item in vars(self).items())

# add this as a parse action, and pyparsing will autoconvert the parsed data to a Bunch
bunch_decl.addParseAction(lambda t: Bunch(**t.args.asDict()))

Now the parser will give you an actual Bunch instance:

[Bunch:('durations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], 
        'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'], 
        'onsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])]

Noice! Excellent timing! I had just started thinking about doing a small grammar in lex/yacc (Old Dogs, etc., etc.), although I really wanted to stay inside Python. I had completely forgotten about pyparsing. — Peter Rowell, Jun 07 '16 at 16:41
If you really want to go lex/yacc, you can still stay pretty much in Python using PLY. — PaulMcG, Jun 07 '16 at 17:56

Read Bunch() from string

2 Answers2