2

I'm trying to find way to parse string that can contain variable, function, list, or dict written in python syntax separated with ",". Whitespace should be usable anywhere, so split with "," when its not inside (), [] or {}.

Example string: "variable, function1(1,3), function2([1,3],2), ['list_item_1','list_item_2'],{'dict_key_1': "dict_item_1"}"

Another example string: "variable,function1(1, 3) , function2( [1,3],2), ['list_item_1','list_item_2'],{'dict_key_1': "dict_item_1"}"

Example output ["variable", "function1(1,3)", "function2([1,3],2)", "['list_item_1','list_item_2']", "{'dict_key_1': "dict_item_1"}"]

edit: Reason for the code is to parse string an then run it with exec("var = &s" % list[x]). (yes i know this might not be recommended way to do stuff)

3 Answers3

2

I guess the main problem here is that the arrays and dicts also have commas in them, so just using str.split(",") wouldn't work. One way of doing it is to parse the string one character at a time, and keep track of whether all brackets are closed. If they are, we can append the current result to an array when we come across a comma. Here's my attempt:

s = "variable, function1(1,3),function2([1,3],2),['list_item_1','list_item_2'],{'dict_key_1': 'dict_item_1'}"

tokens = []
current = ""
open_brackets = 0

for char in s:
    current += char

    if char in "({[":
        open_brackets += 1
    elif char in ")}]":
        open_brackets -= 1
    elif (char == ",") and (open_brackets == 0):
        tokens.append(current[:-1].strip())
        current = ""

tokens.append(current)

for t in tokens:
    print(t)

"""
    variable
    function1(1,3)
    function2([1,3],2)
    ['list_item_1','list_item_2']
    {'dict_key_1': 'dict_item_1'}
"""
damjan
  • 76
  • 1
  • 3
  • I was thinking of the same idea using a `list` as a stack of brackets, but the `open_brackets` counter works the same and is simpler. – chapelo Sep 17 '16 at 13:40
  • Yeah i just thought that regex/python would have had way of doing it instead of writing algorithm my self. I will have to do that then i suppose. – SacredCoconut Sep 17 '16 at 13:48
0

Have you tried using split?

>>> teststring = "variable, function1(1,3), function2([1,3],2), ['list_item_1','list_item_2'],{'dict_key_1': 'dict_item_1'}"
>>> teststring.split(", ")
['variable', 'function1(1,3)', 'function2([1,3],2)', "['list_item_1','list_item_2'],{'dict_key_1': 'dict_item_1'}"]
Jack Evans
  • 1,697
  • 3
  • 17
  • 33
  • Oh yeah i forgot to mention it might or might not have whitespace after ",". For example "['variable', function1(1, 3)'" would not work. – SacredCoconut Sep 17 '16 at 13:00
0

Regular expressions aren't very good for parsing the complexity of arbitrary code. What exactly are you trying to accomplish? You can (unsafely) use eval to just evaluate the string as code. Or if you're trying to understand it without evaling it, you can use the ast or dis modules for various forms of inspection.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271