Json.dump failing with 'must be unicode, not str' TypeError

Question

I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I'm loading it into my python 2.7 script using io.open as follows:

with io.open('multiIdName.json', encoding="utf-8") as json_data:
    cards = json.load(json_data)

I add a new property to the json, all good. Then I attempt to write it back out to another file:

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
        json.dump(cards, outfile, ensure_ascii=False)

That's when I get the error TypeError: must be unicode, not str

I tried writing the outfile as a binary (with io.open("testJson.json",'wb') as outfile:), but I end up with stuff this:

{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"}

I thought opening and writing it in the same encoding would be enough, as well as the ensure_ascii flag, but clearly not. I just want to preserve the characters that existed in the file before I run my script, without them turning into \u's.

I'm not sure. I believe it's because you're opening the file-pointer as a utf-8 encoded file but you're dumping a `string` type object (`cards`). — Chuck, Mar 15 '16 at 05:15
Ah, should've mentioned, cards is a json object: ```cards = json.load(json_data)``` — IronWaffleMan, Mar 15 '16 at 05:21
what is the new property you add? is it possible to write a [Minimal Verifiable Example](http://stackoverflow.com/help/mcve)? — Tadhg McDonald-Jensen, Mar 15 '16 at 05:59

score 34 · Accepted Answer · edited Mar 15 '16 at 10:18

34

Can you try the following?

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
  outfile.write(unicode(json.dumps(cards, ensure_ascii=False)))

edited Mar 15 '16 at 10:18

Alastair McCormack

26,573
8
77
100

answered Mar 15 '16 at 06:04

Yaron

10,166
9
45
65

1

That seems to have done the trick, thanks. I presume the outfile.write takes the output from json.dumps and then writes it to the file? – IronWaffleMan Mar 15 '16 at 06:33
1

Great :) Yes. the outfile.write(content) - writes the content to the outfile. While outfile refers to "testJson.json" file. see more in https://docs.python.org/2/tutorial/inputoutput.html – Yaron Mar 15 '16 at 06:36
1

Danger! You've got an implied str->Unicode conversion, without an encoding defined. In Python 2.x, the default encoding is ASCII, so you will get a `UnicodeDecodeError` exception if your JSON contains non-ASCII chars – Alastair McCormack Mar 15 '16 at 10:28
You can have 8-bit strings given to json, and the output would still break. – Antti Haapala -- Слава Україні Mar 15 '16 at 10:29

Antti Haapala -- Слава Україні · Answer 2 · 2016-03-15T10:35:54.560

The reason for this error is the completely stupid behaviour of json.dumps in Python 2:

>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': 'ä'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': 'ä'}, ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This coupled with the fact that io.open with encoding set only accepts unicode objects (which by itself is right), leads to problems.

The return type is completely dependent on whatever is the type of keys or values in the dictionary, if ensure_ascii=False, but str is returned always if ensure_ascii=True. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to unicode, because you need to set the encoding, presumably UTF-8:

>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
...     x = unicode(x, 'UTF-8')

In this case I believe you can use the json.dump to write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code.

One solution is to end all this encoding/decoding madness by switching to Python 3.

I think your answer better answers the question. Wanna add the write part to your answer and I'll delete my answer? — Alastair McCormack, Mar 15 '16 at 15:10
@AlastairMcCormack naah busy now, and I hit repcap today already :D — Antti Haapala -- Слава Україні, Mar 15 '16 at 15:11
If I managed to convert my script to py3, how would the encoding handling change? — IronWaffleMan, Mar 15 '16 at 15:28

Alastair McCormack · Answer 3 · 2016-03-15T15:13:07.190

3

The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using encoding attribute on the load() and dump() methods.

with open('multiIdName.json', 'rb') as json_data:
    cards = json.load(json_data)

then:

with open("testJson.json", 'wb') as outfile:
    json.dump(cards, outfile, ensure_ascii=False)

Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object.

You will have to add a sense check to ensure the result is a Unicode before writing through io:

with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
    my_json_str = json.dumps(my_obj, ensure_ascii=False)
    if isinstance(my_json_str, str):
        my_json_str = my_json_str.decode("utf-8")

    outfile.write(my_json_str)

edited Mar 15 '16 at 15:13

answered Mar 15 '16 at 10:25

Alastair McCormack

26,573
8
77
100

When I do that I get: ```UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)``` – IronWaffleMan Mar 15 '16 at 14:14
Are you sure you set the `b` mode on both `open()` calls? – Alastair McCormack Mar 15 '16 at 14:16
Yep I'm sure. It was on the json.dump line. – IronWaffleMan Mar 15 '16 at 14:20

score 0 · Answer 4 · answered Jul 12 '21 at 12:10

0

Can you try the following?

# -*- coding:utf-8 -*-
import codecs
with codecs.open("test.json","w") as file:
    json.dump(my_list, file, indent=4, ensure_ascii=False)

answered Jul 12 '21 at 12:10

CircleC

1

For best practices, let your answer conform to the used cases of questions – fatiu Jul 12 '21 at 13:15

Json.dump failing with 'must be unicode, not str' TypeError

4 Answers4

Linked