14

What's the best way to build a dictionary from a string like the one below:

"{key1 value1} {key2 value2} {key3 {value with spaces}}"

So the key is always a string with no spaces but the value is either a string or a string in curly brackets (it has spaces)?

How would you dict it into:

{'key1': 'value1',   'key2': 'value2',   'key3': 'value with spaces'}
mtmt
  • 173
  • 1
  • 10
  • How do you define "the best way"? Fast, elegant, maintainable, ...? Also, what have you tried yourself? What worked and what didn't? Why not? – Thomas Weller May 28 '15 at 15:09

4 Answers4

19
import re
x="{key1 value1} {key2 value2} {key3 {value with spaces}}"
print dict(re.findall(r"\{(\S+)\s+\{*(.*?)\}+",x))

You can try this.

Output:

{'key3': 'value with spaces', 'key2': 'value2', 'key1': 'value1'}

Here with re.findall we extract key and its value.re.findall returns a list with tuples of all key,value pairs.Using dict on list of tuples provides the final answer. Read more here.

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
vks
  • 67,027
  • 10
  • 91
  • 124
  • 2
    This is really awesome! It would be a great if you could post a little bit of explanation so I can learn and understand this better. Thanks. – Joe T. Boka May 28 '15 at 06:45
  • @vks That's great! How would I update it to support cases with spaces around brackets like this: "{ key1 value1 } { key2 value2 } { key3 {value with spaces} }" – mtmt May 28 '15 at 06:59
  • @mtmt `print dict(re.findall(r"\{\s*(\S+)\s+\{*(.*?)\}+",x))` – vks May 28 '15 at 07:00
  • Actually for a case with spaces this would be a simple change "\{ (\S+)\s+\{*(.*?)\}+" I just need to think of a way to support both scenarios – mtmt May 28 '15 at 07:03
  • @mtmt change it to `\s*` to support both cases – vks May 28 '15 at 07:04
  • You can fine-tune the expression a bit to prohibit backtracking, which should give you a performance benefit if the string is invalid. Make the first three quantifiers possessive: `\{(\S++)\s++\{*+(.*?)\}+` – Falco May 28 '15 at 11:47
4

I can´t make it more elegantly:

input = "{key1 value1} {key2 value2} {key3 {value with spaces}}"
x = input.split("} {")             # creates list with keys and values
y = [i.split(" {") for i in y]     # separates the list-values from keys
# create final list with separated keys and values, removing brackets
z = [[i.translate(None,"{").translate(None,"}").split() for i in j] for j in y]

fin = {}
for i in z:
    fin[i[0][0]] = i[-1]

It´s very hacky, but it should do the job.

Renatius
  • 542
  • 1
  • 3
  • 11
2

Assuming that you don't have anything in your string more nested than what is in your example, you could first use lookahead/lookbehind assertions to split the string into your key-value pairs, looking for the pattern } { (the end of one pair of brackets and the beginning of another.)

>>> str = '{key1 value1} {key2 value2} {key3 {value with spaces}}'
>>> pairs = re.split('(?<=})\s*(?={)', str)

This says "Match on any \s* (whitespace) that has a } before it and a { after it, but don't include those brackets in the match itself."

Then you have your key-value pairs:

>>> pairs
['{key1 value1}', '{key2 value2}', '{key3 {value with spaces}}']

which can be split on whitespace with the maxsplit parameter set to 1, to make sure that it only splits on the first space. In this example I have also used string indexing (the [1:-1]) to get rid of the curly braces that I know are at the beginning and end of each pair.

>>> simple = pairs[0] 
>>> complex = pairs[2]  
>>> simple
'{key1 value1}'
>>> complex
'{key3 {value with spaces}}'
>>> simple[1:-1]
'key1 value1'
>>> kv = re.split('\s+', simple[1:-1], maxsplit=1)
>>> kv
['key1', 'value1']
>>> kv3 = re.split('\s+', complex[1:-1], maxsplit=1)
>>> kv3
['key3', '{value with spaces}']

then just check whether the value is enclosed in curly braces, and remove them if you need to before putting them into your dictionary.

If it is guaranteed that the key/value pairs will always be separated by a single space character, then you could use plain old string split instead.

>>> kv3 = complex[1:-1].split(' ', maxsplit=1)
>>> kv3
['key3', '{value with spaces}']
tla
  • 553
  • 3
  • 7
1

The answer by @vks doesn't check for balanced braces. Try the following:

>>> x="{key3 {value with spaces} {key4 value4}}"
>>> dict(re.findall(r"\{(\S+)\s+\{*(.*?)\}+",x))
{'key3': 'value with spaces', 'key4': 'value4'}

Try instead:

>>> dict(map(lambda x:[x[0],x[2]], re.findall(r'\{(\S+)\s+(?P<Brace>\{)?((?(Brace)[^{}]*|[^{}\s]*))(?(Brace)\})\}',x)))
{'key4': 'value4'}

that is, it matches only on the part with correct bracing.

The (?P<Brace>\{) saves the match of a {, and later (?(Brace)\})will match } only if the first one matched, and so braces must come in matching pairs. And by the (?(Brace)...|...) construct, if \Brace matched, the value part can contain anything except braces ([^{}]*), else no space is allowed ([^{}\s]*).

As the optional brace is matched in the regexp, and thus returned in the list, we need to extract element 0 and 2 from each list by the map() function.

Regexps easily gets messy.

micce
  • 131
  • 1
  • 4