1

I have a text file that is reading in another application's script. This how it looks like in notepad:

Одинцовская РЭС: М.О., Одинцовский район
Климовская РЭС: М.О., г.о. Подольск
Кульшовская РЭС: М.О., г.о. Подольск

This file is needed for two things: 1. Creating a dict of values, separated by ':'. I use this dict in another part of script 2. Allows user to select desirable value

enter image description here

Here is what user see when he launhes script. When a certain value is selected I have to use it in dictionary. But the problem is that selection is in unicode format (because of features of script building in ArcGIS) while the dictionary's keys are str. So I need a value in dictionary which looks like '\xce\xe4\xe8\xed\xf6\xee\xe2\xf1\xea\xe0\xff \xd0\xdd\xd1' to be converted in unicode. But when I make .encode('utf-8') it throws an error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

Pavel Pereverzev
  • 459
  • 1
  • 6
  • 21

1 Answers1

2

This should work

>>> c = b'\xce\xe4\xe8\xed\xf6\xee\xe2\xf1\xea\xe0\xff \xd0\xdd\xd1'
>>> c
b'\xce\xe4\xe8\xed\xf6\xee\xe2\xf1\xea\xe0\xff \xd0\xdd\xd1'
>>> c.decode('unicode_escape')
'Îäèíöîâñêàÿ ÐÝÑ'

The b'' prefix denotes sequence of 8-bit bytes.

Take a look at SO read russian characters

Richard Rublev
  • 7,718
  • 16
  • 77
  • 121