45

I am confuse now why I am not able to parse this JSON string. Similar code works fine on other JSON string but not on this one - I am trying to parse JSON String and extract script from the JSON.

Below is my code.

for step in steps:
    step_path = '/example/v1' +'/'+step

    data, stat = zk.get(step_path)
    jsonStr = data.decode("utf-8")
    print(jsonStr)
    j = json.loads(json.dumps(jsonStr))
    print(j)
    shell_script = j['script']
    print(shell_script)

So the first print(jsonStr) will print out something like this -

{"script":"#!/bin/bash\necho Hello world1\n"}

And the second print(j) will print out something like this -

{"script":"#!/bin/bash\necho Hello world1\n"}

And then the third print doesn't gets printed out and it gives this error -

Traceback (most recent call last):
  File "test5.py", line 33, in <module>
    shell_script = j['script']
TypeError: string indices must be integers

So I am wondering what wrong I am doing here?

I have used same above code to parse the JSON and it works fine..

AKIWEB
  • 19,008
  • 67
  • 180
  • 294
  • what is the expected output for the third time? – tmj Nov 20 '13 at 00:01
  • It should extract the script portion from the JSON string.. so it should print out `#!/bin/bash\necho Hello world1\n`. Right? – AKIWEB Nov 20 '13 at 00:02

3 Answers3

58

The problem is that jsonStr is a string that encodes some object in JSON, not the actual object.

You obviously knew it was a string, because you called it jsonStr. And it's proven by the fact that this line works:

jsonStr = data.decode("utf-8")

So, jsonStr is a string. Calling json.dumps on a string is perfectly legal. It doesn't matter whether that string was the JSON encoding of some object, or your last name; you can encode that string in JSON. And then you can decode that string, getting back the original string.

So, this:

j = json.loads(json.dumps(jsonStr))

… is going to give you back the exact same string as jsonStr in j. Which you still haven't decoded to the original object.

To do that, just don't do the extra encode:

j = json.loads(jsonStr)

If that isn't clear, try playing with it an interactive terminal:

>>> obj = ['abc', {'a': 1, 'b': 2}]
>>> type(obj)
list
>>> obj[1]['b']
2
>>> j = json.dumps(obj)
>>> type(j)
str
>>> j[1]['b']
TypeError: string indices must be integers
>>> jj = json.dumps(j)
>>> type(jj)
str
>>> j
'["abc", {"a": 1, "b": 2}]'
>>> jj
'"[\\"abc\\", {\\"a\\": 1, \\"b\\": 2}]"'
>>> json.loads(j)
['abc', {'a': 1, 'b': 2}]
>>> json.loads(j) == obj
True
>>> json.loads(jj)
'["abc", {"a": 1, "b": 2}]'
>>> json.loads(jj) == j
True
>>> json.loads(jj) == obj
False
mallet
  • 2,454
  • 3
  • 37
  • 64
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 2
    Up Voted: Because you did more than just say replace a with b... You explained a little as to why and tried to show examples on how to debug.. – Angry 84 Nov 09 '15 at 01:18
45

Try replacing j = json.loads(json.dumps(jsonStr)) with j = json.loads(jsonStr).

1st1
  • 1,101
  • 8
  • 8
  • 1
    Probably `j` was the same as `jsonStr`, a string. Indexing that with a string key fails with the exception you describe. Leaving out the `dumps` call makes the `loads` call turn the original string into a dictionary, which then you were able to index by key. If you turned your `print(j)` call into `print(repr(j))` you'll be able to see the difference. – Blckknght Nov 20 '13 at 00:07
  • I don't understand. when I try to do `json.loads(string_value)` it does and IMHO it should raise a `ValueError Exception: No JSON object could be decoded` as string_value is just a string not a JSON object. – tmj Nov 20 '13 at 00:11
  • @tMJ: If `jsonStr` is a string representing an object in JSON, then `json.dumps(jsonStr)` is a string representing (a string representing an object in JSON) in JSON. That's perfectly valid, but when you `loads` it, you get back a string representing an object in JSON; you'd have to `loads` it twice to get back the original object. – abarnert Nov 20 '13 at 00:57
  • @abarnert So is the value returned from the call `zk.get`, is actually a JSON object? – tmj Nov 20 '13 at 00:59
  • 4
    @tMJ: There's no such thing as "a JSON object". When people say that, half the time they mean "a Python (or JavaScript or whatever) object that can be encoded in JSON", and half the time they mean "a string which is the JSON representation of an object". I have no idea which one you mean. I suspect you don't either. Just don't use that term. Obviously the value returned by `zk.get` is a `bytes` object, because you can call `.decode('utf-8')` on it. Presumably it's the UTF-8 encoding of a string that's the JSON representation of an object. – abarnert Nov 20 '13 at 01:04
  • @tMJ: Meanwhile, if you don't understand why `json.loads(json.dumps(jsonStr))` is wrong, see [this pastebin](http://pastebin.com/6PwsyHwQ), and work through some similar examples yourself in the interactive shell and/or log out some reprs and types from your actual code to see what's what. – abarnert Nov 20 '13 at 01:05
  • @abarnert Yeah, I meant the second. Ok! Thanks. `Presumably it's the UTF-8 encoding of a string that's the JSON representation of an object.` – tmj Nov 20 '13 at 01:06
9

Ok... So for people who are still lost because they are used to JS this is what I understood after having tested multiple use cases :

  • json.dumps does not make your string ready to be loaded with json.loads. It will only encode it to JSON specs (by adding escapes pretty much everywhere) !

  • json.loads will transform a correctly formatted JSON string to a python dictionary. It will only work if the JSON follows the JSON specs (no single quotes, uppercase for boolean's first letter, etc).

Dumping JSON - An encoding story

Lets take an example !

$ obj = {"foobar": True}

This is NOT json ! This is a python dictionary that uses python types (like booleans).

True is not compatible with the JSON specs so in order to send this to an API you would have to serialize it to REAL JSON. That's where json.dumps comes in !

$ json.dumps({"foobar": True})
'{"foobar": true}'

See ? True became true which is real JSON. You now have a string that you can send to the real world. Good job.

Loading JSON - A decoding story

So now lets talk about json.loads.

You have a string that looks like json but its only a string and what you want is a python dictionary. Lets walk through the following examples :

$ string = '{"foobar": true}'
$ dict = json.loads(string)
{'foobar': True}

Here we have a string that looks like JSON. You can use json.loads to transform this string in dictionary and do dict["foobar"] which will return True.

So, why so many errors ?

Well, if your JSON looks like JSON but is not really JSON compatible (spec wise), for instance :

$ string = "{'foobar': true}"
$ json.loads(string)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes

BAM ! This is not working because JSON specs wont allow you to have single quotes but only double ones... If you reverse the quotes to '{"foobar": true}' then it will work.

What you probably have tried is :

string = json.loads(json.dumps("{'foobar': true}"))

This JSON is invalid (check the quotes) and moreover you'll get a string as a results. Disapointed ? I know...

  • json.dumps WILL fix you JSON string but will also encode it. The encoding will render json.loads useless even if the JSON is now good to go.

You have to understand that json.dumps encodes and json.loads decodes !

So what you did here is encode a string and then decode the string. But its still a string ! you haven't done anything to change that fact ! If you want to get it from string to dictionary then you need an extra step... => A second json.loads !

Lets try that with a valid JSON (no mean single quotes)

$ obj = json.loads(json.loads(json.dumps('{"foobar": true}')))
$ obj["foobar"]
True

The json string went through json.dumps and got encoded. Then it when through json.loads where it got decoded (useless...YEAY). Finaly, it went through json.loads AGAIN and got transformed from string to dictionary. As you can see, using json.dumps only adds a useless step in that situation.

One last thing. If we do the same thing again but with a bad JSON:

$ string = json.loads(json.loads(json.dumps("{'foobar': true}")))
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes

Quotes are wrong here (ain't you getting used to this by now ?). What happend here is that json.dumps fixed your JSON. json.loads removed the fix (lol) and finaly json.loads got the bad JSON which did not change as the first 2 steps canceled each other.

TL;DR

In conclusion : Fix you JSON yourself ! Don't give to json.loads wrongly formated JSON and don't try to mix json.loads with json.dumps to fix what only you can fix.
Hope this helped someone ;-)

Disclaimer. I'm no python expert.
Feel free to challenge this answer in the comment section.

Doctor
  • 7,115
  • 4
  • 37
  • 55