1

I have multiple dictionaries which are not separated by commas and type is string, is it possible to separate them and get nice elements in a list, with each element representing a dictionary.

E.g. : what I have: {} {} {}

What I want [{}, {}, {}]

I know its similar to Want to separate list of dictionaries not separated by comma, But i do not want to call subprocess and call sed.

Example:

data  = {"key1":"val1", "key2":"val2", "key3":"val3", "key4":"val4"} {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"} {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}

what i want is  :

[{"key1":"val1", "key2":"val2", "key3":"val3", "key4":"val4"} , 
 {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}, 
 {"key1":"someval", "key2":"someval", "key3":"someval", "key4":"someval"}]

How do i achieve this.

Example 2 :

string = '''{"Date1":"2017-02-13T00:00:00.000Z","peerval":"222.22000","PID":109897,"Title":"Prop 1","Temp":5,"Temp Actual":5,"Temp Predicted":3.9,"Level":"Medium","Explaination":"Source: {some data \n  some link http:\\www.ggogle\.com with some sepcial characters ">< ?? // {} [] ;;}","creator":"\\etc\\someid","createdtime" :"2017-02-12T15:24:38.380Z"}
{"Date1":"2017-02-13T00:00:00.000Z","peerval":"222.22000","PID":109890,"Title":"Prop 2","Temp":5,"Temp Actual":5,"Temp Predicted":3.9,"Level":"Medium","Explaination":"Source: {some data \n  some link http:\\www.ggogle\.com with some sepcial characters ">< ?? // {} [] ;;}","creator":"\\etc\\someid","createdtime" :"2017-02-12T15:24:38.380Z"}

'''

NOTE : each dictionary ends with $(newline)

Rachel
  • 247
  • 6
  • 19

4 Answers4

4

This approach is a bit slow, (approximately O(N^2) with respect to string length), but it can handle pretty complicated literal syntax, including nested data structures. Call ast.literal_eval in a loop on successively smaller slices of s until you find a syntactically valid slice. Then remove that slice and continue until the string is empty.

import ast

def parse_consecutive_literals(s):
    result = []
    while s:
        for i in range(len(s), 0, -1):
            #print(i, repr(s), repr(s[:i]), len(result))
            try:
                obj = ast.literal_eval(s[:i])
            except SyntaxError:
                continue
            else:
                result.append(obj)
                s = s[i:].strip()
                break
        else:
            raise Exception("Couldn't parse remainder of string: " + repr(s))
    return result

test_cases = [
    "{} {} {}",
    "{}{}{}",
    "{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}",
    "[11] 'twelve' 13 14.0",
    "{} {\"'hi '}'there\"} {'whats \"}\"{\"up'}",
    "{1: 'foo\\'}bar'}"
]

for s in test_cases:
    print("{} parses into {}".format(repr(s), parse_consecutive_literals(s)))

Result:

'{} {} {}' parses into [{}, {}, {}]
'{}{}{}' parses into [{}, {}, {}]
"{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}" parses into [{1: 2, 3: 4}, {5: 6, '7': [8, {9: 10}]}]
"[11] 'twelve' 13 14.0" parses into [[11], 'twelve', 13, 14.0]
'{} {"\'hi \'}\'there"} {\'whats "}"{"up\'}' parses into [{}, {"'hi '}'there"}, {'whats "}"{"up'}]
"{1: 'foo\\'}bar'}" parses into [{1: "foo'}bar"}]

I don't enthusiastically endorse this solution for production-quality code, however. It would be much better to serialize your data in a more sensible format in the first place, such as json.

Kevin
  • 74,910
  • 12
  • 133
  • 166
  • yes mine is a production code and i have millions of data this approach might slow down the process. so was looking for somethong like json loads to work – Rachel Sep 05 '18 at 04:25
  • how do i serialize this dict string data – Rachel Sep 13 '18 at 08:05
3

With arbirtrary spaces, runtime is O(n), optimized for string concatenation and no overhead using Python libraries:

def fetch_until(sep, char_iter):                                                                                                                                         
    chars = []
    escapes = 0
    while True:
        try:
            c = next(char_iter)
        except StopIteration:
            break
        if c == "\\":
            escapes += 1
        chars.append(c)
        if c == sep:
            if escapes % 2 == 0:
                break
        if c != "\\":
            escapes = 0
    return chars

def fix(data):
    brace_level = 0
    result = []
    char_iter = iter(data)

    try:
        while True:
            c = next(char_iter)
            result.append(c)
            if c in ("'", '"'):
                result.extend(fetch_until(c, char_iter))
            elif c == "{":
                brace_level += 1
            elif c == "}":
                brace_level -= 1
                if brace_level == 0:
                    result.append(",")
    except StopIteration:
        pass

    return eval("[{}]".format("".join(result[:-1])))

test_cases = [
   "{1: 'foo\\'}bar'}",
    "{} {\"'hi '}'there\"} {'whats \"}\"{\"up'}",
    "{}{}{}",
    "{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}",
    "{1: {}} {2:3, 4:{}} {(1,)}",
    "{1: 'foo'} {'bar'}",
]

for test_case in test_cases:
    print("{!r:40s} -> {!r}".format(test_case, fix(test_case)))

outputs

"{1: 'foo\\'}bar'}"                      -> [{1: "foo'}bar"}]
'{} {"\'hi \'}\'there"} {\'whats "}"{"up\'}' -> [{}, {"'hi '}'there"}, {'whats "}"{"up'}]
'{}{}{}'                                 -> [{}, {}, {}]
"{1: 2, 3: 4}{5:6, '7': [8, {9: 10}]}"   -> [{1: 2, 3: 4}, {5: 6, '7': [8, {9: 10}]}]
'{1: {}} {2:3, 4:{}} {(1,)}'             -> [{1: {}}, {2: 3, 4: {}}, {(1,)}]
"{1: 'foo'} {'bar'}"                     -> [{1: 'foo'}, {'bar'}]

I also did some timing:

import time
for n in (1000, 10000, 100000):
    long = test_cases[0] * n
    started = time.time()
    fix(long)
    needed = time.time() - started
    print("len_input = {:7d},  time={:7.1f} ms ".format(len(long), needed * 1000))

which prints (on my slow macbook air):

len_input =   37000,  time=   12.2 ms
len_input =  370000,  time=  110.6 ms
len_input = 3700000,  time= 1124.4 ms
rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
-1

You can convert it into valid json string, then it's quite easy to do the same.

import json
mydict_string = mydict_string.replace(' {', ',{')
mylist = json.loads(mydict_string)

Otherwise, although I won't recommend but you can use eval as well.

mylist = map(eval, mydict_string.split(' '))

This would work even if the inside dicts are not empty.

hspandher
  • 15,934
  • 2
  • 32
  • 45
-1

Assuming dict_string is your input string, you can try this

import json
my_dicts = [json.loads(i) for i in dict_string.replace(", ",",").split()]
prithajnath
  • 2,000
  • 14
  • 17