-1

I've got a long string. This string contains a list, like such example

'[{"ex1": 0, "ex2":1}, {"ex3": 2, "ex4":3}]'

I can use json5.loads and then get the first element by using [0] on the list, but json5.loads takes a long time for longer strings. Is there a way to get just the first element without loading the entire list? (in this example it would be {"ex1": 0, "ex2":1}. Splitting by commas doesn't work for me since there are commas contained in dictionaries in the list. Thanks.

cold10
  • 130
  • 8
  • Use the `json` module from the standard library instead? – Iain Shelvington Apr 19 '22 at 02:30
  • Is the `json` module more efficient than `json5`? – cold10 Apr 19 '22 at 02:31
  • 2
    From the PyPi page for the package `This is an early release. It has been reasonably well-tested, but it is SLOW. It can be 1000-6000x slower than the C-optimized JSON module, and is 200x slower (or more) than the pure Python JSON module` https://pypi.org/project/json5/ – Iain Shelvington Apr 19 '22 at 02:31
  • I use the `json5` module because it allows single-quoted strings, and will allow more flexibility. Any other methods? – cold10 Apr 19 '22 at 02:36

2 Answers2

0

If it'll definitely be that format, you can just search for the beginning and ending brackets.

mystr = '[{"ex1": 0, "ex2":1}, {"ex3": 2, "ex4":3}]'
first = mystr.index("{")
last = mystr.index("}")
extracted = mystr[first:last+1]
print(extracted)

this prints '{"ex1": 0, "ex2":1}'

For a more complicated string:

mystr = '[{"ex1": {"ex1.33": -1, "ex1.66": -2}, "ex2":1}, {"ex3": 2, "ex4":3}]'
n_open = 0
n_close = 0
first = mystr.index("{")
for ii in range(len(mystr)):
    if mystr[ii] == "{":
        n_open += 1
    elif mystr[ii] == "}":
        n_close += 1
    if n_open > 0 and n_open == n_close:
        break
extracted = mystr[first:ii+1]
lumalot
  • 141
  • 10
  • That was just an example, my actual string is much more complicated and has many nested brackets. Any other ways? Thanks anyway! – cold10 Apr 19 '22 at 02:35
  • Without more specifics, it's hard to offer something more concrete. You can do while loop as you step through the string, but that'll probably be pretty slow. Count the number of open brackets and close brackets, and exit out when the number of close brackets equals the number of open (and the number is >0) – lumalot Apr 19 '22 at 02:37
  • I'll try that, thanks for the idea. – cold10 Apr 19 '22 at 02:39
  • I updated the answer to include an example for a more generic finder – lumalot Apr 19 '22 at 02:55
0

Does your string work with ast.literal_eval()? If it does, you could do

obj = ast.literal_eval(s)
# obj[0] gives the first dict

If not, you could loop through the string character-by-character and yield any substring when the number of open-brackets are equal to the number of close-brackets.

def get_top_level_dict_str(s):
  open_br = 0
  close_br = 0
  open_index = 0
  for i, c in enumerate(s):
    if c == '{':
        if open_br == 0: open_index = i 
        open_br += 1
    elif c == '}':
        close_br += 1
        if open_br > 0 and open_br == close_br:
            yield s[open_index:i+1]
            open_br = close_br = 0

If you want to parse the resulting substrings to objects, you could use json5 like you already do, which is probably faster on the smaller string, or use ast.literal_eval()

x = get_top_level_dict_str(s)
# next(x) gives the substring
# then use json5 or ast.literal_eval()
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70