How to acces string with mixed 1 byte and 2 byte symbols?

Question

I read questions and answers for a quiz from a file in UTF-8 encoding but the answer can consists 1 byte symbols (English) and 2 byte symbols (Russian) in the same text:

"best car тайота"`

I need to write answer replaced with "*" so it looks like "**** *** ******" to help guess what answer is. For determining length I use

len(answer.decode('utf-8'))

But in the next hint when I want to show some symbols like "b*s* ca* *а*от*", I can access the 1 byte symbols via answer[index] but I can't read 2 byte symbols this way, and that's why I get "b*s* ca*" without 2 byte symbols.

Is there solution for this?

score 3 · Accepted Answer · answered Apr 04 '15 at 14:11

Decode the string to a Unicode value once, and do your replacements in that.

A unicode string object supports the same operations as byte strings; just be careful when mixing byte strings and Unicode strings as that could trigger an automatic encode or decode (leading to UnicodeEncode or UnicodeDecode errors). Printing the string should automatically encode the value to match your terminal codec.

You may want to read up on Python and Unicode:

Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO

How to acces string with mixed 1 byte and 2 byte symbols?

1 Answers1