Retrieve list of dictionary from string with dictionaries

Question

I have multiple dictionaries which are not separated by commas and type is string, is it possible to separate them and get nice elements in a list, with each element representing a dictionary.

E.g. : what I have: {} {} {}

What I want [{}, {}, {}]

I know its similar to Want to separate list of dictionaries not separated by comma, But i do not want to call subprocess and call sed.

Example:

data  = {"key1":"val1", "key2":"val2", "key3":"val3", "key4":"val4"} {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"} {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}

what i want is  :

[{"key1":"val1", "key2":"val2", "key3":"val3", "key4":"val4"} , 
 {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}, 
 {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}]

How do i achieve this.

Example 2 :

string = '''{"Date1":"2017-02-13T00:00:00.000Z","peerval":"222.22000","PID":109897,"Title":"Prop 1","Temp":5,"Temp Actual":5,"Temp Predicted":3.9,"Level":"Medium","Explaination":"Source: {some data \n  some link http:\\www.ggogle\.com with some sepcial characters ">< ?? // {} [] ;;}","creator":"\\etc\\someid","createdtime" :"2017-02-12T15:24:38.380Z"}
{"Date1":"2017-02-13T00:00:00.000Z","peerval":"222.22000","PID":109890,"Title":"Prop 2","Temp":5,"Temp Actual":5,"Temp Predicted":3.9,"Level":"Medium","Explaination":"Source: {some data \n  some link http:\\www.ggogle\.com with some sepcial characters ">< ?? // {} [] ;;}","creator":"\\etc\\someid","createdtime" :"2017-02-12T15:24:38.380Z"}

'''

NOTE : each dictionary ends with $(newline)

_"I have multiple dictionaries which are not separated by commas and type is string"_ - it seems that you just have a string with curly braces ({}) in it. Don't call that a dictionary. — zvone, Sep 04 '18 at 17:57

Kevin · Answer 1 · 2018-09-06T12:39:00.113

This approach is a bit slow, (approximately O(N^2) with respect to string length), but it can handle pretty complicated literal syntax, including nested data structures. Call ast.literal_eval in a loop on successively smaller slices of s until you find a syntactically valid slice. Then remove that slice and continue until the string is empty.

import ast

def parse_consecutive_literals(s):
    result = []
    while s:
        for i in range(len(s), 0, -1):
            #print(i, repr(s), repr(s[:i]), len(result))
            try:
                obj = ast.literal_eval(s[:i])
            except SyntaxError:
                continue
            else:
                result.append(obj)
                s = s[i:].strip()
                break
        else:
            raise Exception("Couldn't parse remainder of string: " + repr(s))
    return result

test_cases = [
    "{} {} {}",
    "{}{}{}",
    "{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}",
    "[11] 'twelve' 13 14.0",
    "{} {\"'hi '}'there\"} {'whats \"}\"{\"up'}",
    "{1: 'foo\\'}bar'}"
]

for s in test_cases:
    print("{} parses into {}".format(repr(s), parse_consecutive_literals(s)))

Result:

'{} {} {}' parses into [{}, {}, {}]
'{}{}{}' parses into [{}, {}, {}]
"{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}" parses into [{1: 2, 3: 4}, {5: 6, '7': [8, {9: 10}]}]
"[11] 'twelve' 13 14.0" parses into [[11], 'twelve', 13, 14.0]
'{} {"\'hi \'}\'there"} {\'whats "}"{"up\'}' parses into [{}, {"'hi '}'there"}, {'whats "}"{"up'}]
"{1: 'foo\\'}bar'}" parses into [{1: "foo'}bar"}]

I don't enthusiastically endorse this solution for production-quality code, however. It would be much better to serialize your data in a more sensible format in the first place, such as json.

yes mine is a production code and i have millions of data this approach might slow down the process. so was looking for somethong like json loads to work — Rachel, Sep 05 '18 at 04:25

rocksportrocker · Answer 2 · 2018-09-12T14:08:52.673

3

With arbirtrary spaces, runtime is O(n), optimized for string concatenation and no overhead using Python libraries:

def fetch_until(sep, char_iter):                                                                                                                                         
    chars = []
    escapes = 0
    while True:
        try:
            c = next(char_iter)
        except StopIteration:
            break
        if c == "\\":
            escapes += 1
        chars.append(c)
        if c == sep:
            if escapes % 2 == 0:
                break
        if c != "\\":
            escapes = 0
    return chars

def fix(data):
    brace_level = 0
    result = []
    char_iter = iter(data)

    try:
        while True:
            c = next(char_iter)
            result.append(c)
            if c in ("'", '"'):
                result.extend(fetch_until(c, char_iter))
            elif c == "{":
                brace_level += 1
            elif c == "}":
                brace_level -= 1
                if brace_level == 0:
                    result.append(",")
    except StopIteration:
        pass

    return eval("[{}]".format("".join(result[:-1])))

test_cases = [
   "{1: 'foo\\'}bar'}",
    "{} {\"'hi '}'there\"} {'whats \"}\"{\"up'}",
    "{}{}{}",
    "{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}",
    "{1: {}} {2:3, 4:{}} {(1,)}",
    "{1: 'foo'} {'bar'}",
]

for test_case in test_cases:
    print("{!r:40s} -> {!r}".format(test_case, fix(test_case)))

outputs

"{1: 'foo\\'}bar'}"                      -> [{1: "foo'}bar"}]
'{} {"\'hi \'}\'there"} {\'whats "}"{"up\'}' -> [{}, {"'hi '}'there"}, {'whats "}"{"up'}]
'{}{}{}'                                 -> [{}, {}, {}]
"{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}"   -> [{1: 2, 3: 4}, {5: 6, '7': [8, {9: 10}]}]
'{1: {}} {2:3, 4:{}} {(1,)}'             -> [{1: {}}, {2: 3, 4: {}}, {(1,)}]
"{1: 'foo'} {'bar'}"                     -> [{1: 'foo'}, {'bar'}]

I also did some timing:

import time
for n in (1000, 10000, 100000):
    long = test_cases[0] * n
    started = time.time()
    fix(long)
    needed = time.time() - started
    print("len_input = {:7d},  time={:7.1f} ms ".format(len(long), needed * 1000))

which prints (on my slow macbook air):

len_input =   37000,  time=   12.2 ms
len_input =  370000,  time=  110.6 ms
len_input = 3700000,  time= 1124.4 ms

edited Sep 12 '18 at 14:08

answered Sep 04 '18 at 17:59

rocksportrocker

7,251
2
31
48

@Kevin I fixed this. – rocksportrocker Sep 04 '18 at 20:06
This is a nice efficient solution, if you're completely sure that no string literal containing an unmatched right curly bracket will ever appear in the data. Otherwise, you might turn `{1: 'foo}bar'}` into `[{1: 'foo},bar']`. – Kevin Sep 05 '18 at 17:09
damn you got me. – rocksportrocker Sep 06 '18 at 07:24
@Kevin fixed this – rocksportrocker Sep 06 '18 at 07:46
Ok, looking good... Unless a string literal contains an unmatched right curly bracket _and_ an escaped quote character that matches the character that surrounds the string. For instance, `"{1: 'foo\\'}bar'}"` should become `[{1: "foo'}bar"}]`, but instead becomes `[{1: 'foo\'},bar'}]` – Kevin Sep 06 '18 at 12:28
but `print('foo\\')` outputs `foo\`. I guess you uses a back tick too much. – rocksportrocker Sep 06 '18 at 13:25
Not sure what you mean by that. Are you saying "you must have made a mistake typing two backslashes in your example input"? That was intentional, because the test case needs to contain a single backslash character, and it won't do that if I don't escape the backslash with another backslash. – Kevin Sep 06 '18 at 13:35
@rocksportrocker this above function generates a string not a list – Rachel Sep 12 '18 at 13:21
@Rachel you can run `eval` on the string to get the data structure. – rocksportrocker Sep 12 '18 at 13:35
@Rachel fixed this. – rocksportrocker Sep 12 '18 at 14:09
@rocksportrocker eval throws EOL for string literal as informed explanation has all special characters as well as \n – Rachel Sep 12 '18 at 14:25
please provide an exact example what string does not work. – rocksportrocker Sep 12 '18 at 15:03
@rocksportrocker i have updated my question with proper example now . example 2. – Rachel Sep 12 '18 at 15:33
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/179922/discussion-between-rachel-and-rocksportrocker). – Rachel Sep 12 '18 at 15:37

hspandher · Answer 3 · 2018-09-04T17:59:27.507

-1

You can convert it into valid json string, then it's quite easy to do the same.

import json
mydict_string = mydict_string.replace(' {', ',{')
mylist = json.loads(mydict_string)

Otherwise, although I won't recommend but you can use eval as well.

mylist = map(eval, mydict_string.split(' '))

This would work even if the inside dicts are not empty.

edited Sep 04 '18 at 17:59

answered Sep 04 '18 at 17:55

hspandher

15,934
2
32
45

1

Unfortunately, `json.loads("{} {} {}")` raises a ValueError. – Kevin Sep 04 '18 at 17:56
@Kevin Ah! ok. Failed to see the space. – hspandher Sep 04 '18 at 17:57
Well, even if there weren't spaces, `json.loads("{}{}{}")` also raises a ValueError. – Kevin Sep 04 '18 at 17:58
The replace-based solution gives me `json.decoder.JSONDecodeError`, and the second one doesn't seem very useful if it only works on empty dictionaries. – Kevin Sep 04 '18 at 18:04

prithajnath · Answer 4 · 2018-09-14T01:26:52.343

-1

Assuming dict_string is your input string, you can try this

import json
my_dicts = [json.loads(i) for i in dict_string.replace(", ",",").split()]

edited Sep 14 '18 at 01:26

answered Sep 04 '18 at 18:00

prithajnath

2,000
14
17

i get ValueError: Unterminated string starting not sure how to resolve this – Rachel Sep 12 '18 at 13:29

Retrieve list of dictionary from string with dictionaries

4 Answers4