Python: Cyrillic handling

Question

I got this data returned b'\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e from an API. This data is in Russian which I know for sure. I am guessing these values are unicode representation of the cyrillic letters?

The data returned was a byte array.

How can I convert that into readable cyrillic string? Pretty much I need a way to convert that kind into readable human text.

EDIT: Yes this is JSON data. Forgot to mention, sorry.

Most likely you have **JSON** data. – Martijn Pieters May 27 '14 at 18:08 — Martijn Pieters, May 27 '14 at 18:08
Oh yes, forgot to mention it is JSON data. – user1757703 May 27 '14 at 18:09 — user1757703, May 27 '14 at 18:09

score 5 · Accepted Answer · answered May 27 '14 at 18:10

Chances are you have JSON data; JSON uses \uhhhh escape sequences to represent Unicode codepoints. Use the json.loads() function on unicode (decoded) data to produce a Python string:

import json

string = json.loads(data.decode('utf8'))

UTF-8 is the default JSON encoding; check your response headers (if you are using a HTTP-based API) to see if a different encoding was used.

Demo:

>>> import json
>>> json.loads(b'"\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e"'.decode('utf8'))
'Кейтлинпро'

Ahh wonderful. I understand. I was getting a like freaked out thinking there is like a unique way to handle non-ascii chars. — user1757703, May 27 '14 at 18:12

Python: Cyrillic handling

1 Answers1