3

I have the following string in a report file:

"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"

I would like to turn it into a Bunch() object or a dict, so that I can access the information inside (via either my_var.conditions or my_var["conditions"]).

This works very well with eval():

eval("Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])")

however I would like to avoid using that.

I have tried to write a couple of string substitutions so that I convert it to a dict syntax and then parse it with json.loads() but that looks very very hackish, and will break as soon as I encounter any new fields in future strings; e.g.:

"{"+"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"[1:-1]+"}".replace("conditions=","'conditions':")

You get the idea.

Do you know if there is any better way to parse this?

TheChymera
  • 17,004
  • 14
  • 56
  • 86
  • What exactly is your final expected output? Also, can you show what you have done so far, to get an idea of what your approach is like? – idjaw Jun 07 '16 at 01:34

2 Answers2

2

Here is my ugly piece of code, please check:

import re
import json
l = "Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"

exec('{}="{}"'.format(l[:5],l[6:-1]))
sb = re.split("=| [a-zA-Z]", Bunch)
temp = ['"{}"'.format(x) if x.isalpha() else x for x in sb ]
temp2 = ','.join(temp)
temp3 = temp2.replace('",[', '":[')
temp4 = temp3.replace(',,', ',')
temp5 = temp4.replace("\'", '"')
temp6 = """{%s}""" %(temp5)
rslt = json.loads(temp6)

Eventually, the output:

rslt
Out[12]: 
{'urations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
 'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'],
 'nsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]}

rslt["conditions"]
Out[13]: ['s1', 's2', 's3', 's4', 's5', 's6']

Generally, I think re is the package you need, but due to my limited experience of using it, I could apply it well here. Hope someone else will give a more elegant solution.

FYI, you said you could easily use eval to get what you want, but when I try to use it, I got TypeError: 'str' object is not callable. which Python version are you using? (I tried it on Python27 and Python33, both of them cannot work)

MaThMaX
  • 1,995
  • 1
  • 12
  • 23
2

This pyparsing code will define a parsing expression for your Bunch declaration.

from pyparsing import (pyparsing_common, Suppress, Keyword, Forward, quotedString, 
    Group, delimitedList, Dict, removeQuotes, ParseResults)

# define pyparsing parser for the Bunch declaration
LBRACK,RBRACK,LPAR,RPAR,EQ = map(Suppress, "[]()=")
integer = pyparsing_common.integer
real = pyparsing_common.real
ident = pyparsing_common.identifier

# define a recursive expression for nested lists
listExpr = Forward()
listItem = real | integer | quotedString.setParseAction(removeQuotes) | Group(listExpr)
listExpr << LBRACK + delimitedList(listItem) + RBRACK

# define an expression for the Bunch declaration
BUNCH = Keyword("Bunch")
arg_defn = Group(ident + EQ + listItem)
bunch_decl = BUNCH + LPAR + Dict(delimitedList(arg_defn))("args") + RPAR

Here is that parser run against your sample input:

# run the sample input as a test
sample = """Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'],
                  durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
                  onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"""
bb = bunch_decl.parseString(sample)
# print the parsed output as-is
print(bb)

Gives:

['Bunch', [['conditions', ['s1', 's2', 's3', 's4', 's5', 's6']], 
    ['durations', [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]]], 
    ['onsets', [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]]]]

With pyparsing, you can also add a parse-time callback, so that pyparsing will do the tokens->Bunch conversion for you:

# define a simple placeholder class for Bunch
class Bunch(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return "Bunch:(%s)" % ', '.join("%r: %s" % item for item in vars(self).items())

# add this as a parse action, and pyparsing will autoconvert the parsed data to a Bunch
bunch_decl.addParseAction(lambda t: Bunch(**t.args.asDict()))

Now the parser will give you an actual Bunch instance:

[Bunch:('durations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], 
        'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'], 
        'onsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])]
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • Noice! Excellent timing! I had just started thinking about doing a small grammar in lex/yacc (Old Dogs, etc., etc.), although I really wanted to stay inside Python. I had completely forgotten about pyparsing. – Peter Rowell Jun 07 '16 at 16:41
  • If you really want to go lex/yacc, you can still stay pretty much in Python using PLY. – PaulMcG Jun 07 '16 at 17:56