2

There is a problem with .json file, which contains cyrillic symbols. How to convert CP1251 to UTF-8? (temp_data.decode('utf-8') has no effect, such as ensure_ascii=False in .dumps)

import json

def load_data(filepath):   
    with open(filepath, 'r') as f:
        temp_data = json.load(f)
    return temp_data 


    def pretty_print_json(d):
        out_json = json.dumps(d, sort_keys=True, indent=4, separators = (',', ': '))
        print(out_json)

    if __name__ == '__main__':
        print("Enter the path to .json file: ") 
        in_path = input()
        print("There are pretty printed json format: ")
        pretty_print_json(load_data(in_path))
Double_Mind
  • 59
  • 1
  • 9
  • What's your issue? Show a sample data file, desired output, and actual output. – Mark Tolonen Apr 09 '17 at 18:28
  • Data file contains russian words like "ВОДКА" and "БАЛАЛАЙКА", but in result there words are viewing as "/u0439/u0440" etc – Double_Mind Apr 09 '17 at 18:36
  • 1
    What is the encoding of the data file? Update your question with the details. Add a **small** sample of your data that reproduces the problem. – Mark Tolonen Apr 09 '17 at 18:38
  • @Double_Mind: `'\u0439\u0440' == 'йр'`, so it works fine. Can you post the contents of your file? `print(repr(open(your_filename, 'rb').read()))` – Blender Apr 09 '17 at 18:50

2 Answers2

0

You can pass the ensure_ascii, If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only. If ensure_ascii is false, a result may be a Unicode instance. This usually happens if the input contains Unicode strings or the encoding parameter is used.

Change your code to this:

out_json = json.dumps(d, sort_keys=True, indent=4, separators = (',', ': '), ensure_ascii=False)

And there is a full code:

import json

def load_data(filepath):   
    with open(filepath, 'r') as f:
        temp_data = json.load(f)
    return temp_data 


def pretty_print_json(d):
    out_json = json.dumps(d, sort_keys=True, indent=4, separators = (',', ': '), ensure_ascii=False)
    print(out_json)

if __name__ == '__main__':
    print("Enter the path to .json file: ") 
    in_path = raw_input()
    print("There are pretty printed json format: ")
    pretty_print_json(load_data(in_path))

I tested this code with this JSON file.

You can see the result in asciinema.

RaminNietzsche
  • 2,683
  • 1
  • 20
  • 34
0

This works. Provide a sample of your data file and specify the encoding if your data doesn't:

#coding:utf8
import json

datafile_encoding = 'cp1251'  # Any encoding that supports Cyrillic works.

# Create a test file with Cyrillic symbols.
with open('test.json','w',encoding=datafile_encoding) as f:
    D = {'key':'АБВГДЕЖЗИЙКЛМНОПРСТ', 'key2':'АБВГДЕЖЗИЙКЛМНОПРСТ'}
    json.dump(D,f,ensure_ascii=False)

# specify the encoding of the data file
def load_data(filepath):   
    with open(filepath, 'r', encoding=datafile_encoding) as f:
        temp_data = json.load(f)
    return temp_data 

# Use ensure_ascii=False
def pretty_print_json(d):
    out_json = json.dumps(d, sort_keys=True, ensure_ascii=False, indent=4, separators = (',', ': '))
    print(out_json)

if __name__ == '__main__':
    in_path = 'test.json'
    pretty_print_json(load_data(in_path))
{
    "key": "АБВГДЕЖЗИЙКЛМНОПРСТ",
    "key2": "АБВГДЕЖЗИЙКЛМНОПРСТ"
}
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251