0

I read questions and answers for a quiz from a file in UTF-8 encoding but the answer can consists 1 byte symbols (English) and 2 byte symbols (Russian) in the same text:

"best car тайота"`

I need to write answer replaced with "*" so it looks like "**** *** ******" to help guess what answer is. For determining length I use

len(answer.decode('utf-8'))

But in the next hint when I want to show some symbols like "b*s* ca* *а*от*", I can access the 1 byte symbols via answer[index] but I can't read 2 byte symbols this way, and that's why I get "b*s* ca*" without 2 byte symbols.

Is there solution for this?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
ratojakuf
  • 708
  • 1
  • 11
  • 21

1 Answers1

3

Decode the string to a Unicode value once, and do your replacements in that.

A unicode string object supports the same operations as byte strings; just be careful when mixing byte strings and Unicode strings as that could trigger an automatic encode or decode (leading to UnicodeEncode or UnicodeDecode errors). Printing the string should automatically encode the value to match your terminal codec.

You may want to read up on Python and Unicode:

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343