Cyrillic chars in Python 2.7

Question

In my script I pointed 1251 codepage. But Python 2.7.13 output incorrectly shows some cyrillic strings:

Программа 'Game Over' 2.0
('\xd2\xee \xe6\xe5', '\xf1\xe0\xec\xee\xe5', '\xf1\xee\xee\xe1\xf9\xe5\xed\xe8\xe5')
('\xd2\xee\xeb\xfc\xea\xee', '\xf7\xf3\xf2\xfc-\xf7\xf3\xf2\xfc', '\xef\xee\xe1\xee\xeb\xfc\xf8\xe5')
оно...

       GAME OVER




Нажмите Enter для выхода...

I read this and this topics before but it didn't help me. I tried such variants:

# -*- coding: utf-8 -*-
# -*- coding: cp1251 -*-

Why does it happen and how can I fix it?

At the same time Python 3.6.0 output writes all cyrillic chars correctly even without the codepage pointing:

Программа 'Game Over' 2.0
То же самое сообщение
Только чуть-чуть побольше
оно...

       GAME OVER




Нажмите Enter для выхода...

My code:

# coding: cp1251
# game_over.py
# © Andrey Bushman, 2017

print("Программа 'Game Over' " + "2.0")
print("То же", "самое", "сообщение")
print("Только", "чуть-чуть", "побольше")
#print("Вот", end=" ")
print("оно...")

print("""
       GAME OVER
      """)
print("\a")
input("\n\nНажмите Enter для выхода...")

Please post code snippets, not screenshots. – DYZ Jan 15 '17 at 07:54 — DYZ, Jan 15 '17 at 07:54

RemcoGerlich · Answer 1 · 2017-01-15T09:06:18.370

1

print("То же", "самое", "сообщение")

Nothing to do with Cyrillic -- Python 2's print statement doesn't have parentheses.

So here you're printing the tuple ("То же", "самое", "сообщение"), not a string. This does the same thing:

tmp = ("То же", "самое", "сообщение")
print tmp

Either remove the parentheses, or add from __future__ import print_function at the top of your module.

edited Jan 15 '17 at 09:06

answered Jan 15 '17 at 08:58

RemcoGerlich

30,470
6
61
79

Your suggestion avoids printing spurious parentheses but for me does not solve the encoding problem for 2.7. See my second answer. – Terry Jan Reedy Jan 15 '17 at 10:03
@TerryJanReedy: No, his prints where he prints a single string work fine (as then the parens mean nothing, `("abc")` is just the same as `"abc"` but with a comma they become tuples and then the strings are printed as their `repr()`. _The lines without commas print fine, so there is no encoding problem_. The lines with commas are printed as tuples, because they are. Nothing to do with spaces. – RemcoGerlich Jan 15 '17 at 10:19

score 1 · Accepted Answer · answered Jan 15 '17 at 10:09

For 2.7, you should make the strings unicode strings by using the u prefix. The following works both in IDLE and the console (when the console codepage is set to 1251 with chcp 1251).

# coding: utf_8
# game_over.py
# Andrey Bushman, 2017
from __future__ import print_function

print(u"Программа 'Game Over' 2.0"
      )
print (u"То же самое сообщение")
print(u"Только чуть-чуть побольше")
print(u"оно...")

print("""
       GAME OVER
      """)
print(u"\n\nНажмите Enter для выхода...", end='')
a = raw_input()

I separated the prompt and input because input(u'xxxx') was not working. raw_input is needed in 2.x to avoid evaluating the input.

score 0 · Answer 3 · answered Jan 15 '17 at 08:20

0

I spend quite some time figuring out how to use Python 2.7 properly with non-latin1 code pages. The easiest solution I found, by far, is to switch to Python 3. Nothing else comes remotely close to it.

answered Jan 15 '17 at 08:20

zmbq

38,013
14
101
171

1

Not directly an answer as to why that happens, although I'll have to agree. – frostblue Jan 15 '17 at 08:58

ooknosi · Answer 4 · 2017-01-15T08:40:57.177

0

The print statement in python2 evaluates each comma-separated expression within the brackets and converts them to a string before it's printed. That's why each cyrillic character is converted to ASCII when you separate the values with commas.

What you can do is the following:

import codecs

text = ("То же", "самое", "сообщение")
for i in text:
    (codecs.decode(i, 'utf-8'))

Or:

text = ("То же", "самое", "сообщение")
print(' '.join(text))

Make sure you have the following line at the top of your python script if you're using python2.

# -*- coding: utf-8 -*-

edited Jan 15 '17 at 08:40

answered Jan 15 '17 at 08:31

ooknosi

374
1
2
8

At this case my output is `РќР°Р¶РјРёС‚Рµ Enter РґР»СЏ РІС‹С…РѕРґР°...`. – Andrey Bushman Jan 15 '17 at 09:02
@Andrey, looks like UTF-8 is not properly detected there. – Paul Stelian Jan 15 '17 at 09:17

score 0 · Answer 5 · answered Jan 15 '17 at 09:15

Short answer: If you want to print chars other than ascii or those in your default codepage on Windows, use 3.6+. Explanation below.

To properly read a file, the encoding declaration must match the actual encoding of the bytes in the file. If you use a limited (non-utf) encoding and want to print strings to Command Prompt, the limited encoding and the console encoding should also match. Or rather, the subset of unicode that you try to print must be included the the subset that the console will accept.

In this case, if you declare the encoding as cp1251 and save it with IDLE, then IDLE appears to save it with that encoding. By definition, the only chars in the file must be in the cp1251 subset. When you print those characters, the console must accept at least the same subset. You can make Command Prompt accept Russian by running chcp 1251 as a command. (chcp == CHange CodePage.) Warning: this command only affects the current Command Prompt window. Anyway, by matching the encoding declaration and the console codepage, I got your code to run on 2.7, 3.5, and 3.6 in the console (but not in IDLE 2.7). But of course, non-ascii, non-cyrillic chars generated by your code will not print.

In 3.x, Python expects code to be utf_8 by default. For 3.6, Python's interface to Windows' consoles was re-written to put the console in utf_8 mode. So write code in an editor that saves it as utf_8 and, as you noticed, printing to the console in Windows works in 3.6. (In 3.x, printing to IDLE's shell has always worked for all the Basic Multilingual Plane (BMP subset) of unicode. Not working for higher codepoints is a current limitation of tk and hence tkinter, which IDLE uses.)

Has nothing to do with his problem, he can print Cyrillic just fine. — RemcoGerlich, Jan 15 '17 at 09:29
@RemcoGerlich One cannot print Cyrillic to a Windows console without having the console set to either Cyrillic or utf codepage. With 3.x, the code works because he is printing to IDLE's console, which in effect uses a 'BMP' codepage. With 3.6, the code works in the console because it is switched to uft_8. — Terry Jan Reedy, Jan 15 '17 at 09:48
Yes, but apparently he has that worked out as it's working fine. His problem is with using print as a function in Python 2, not anything to do with encoding. — RemcoGerlich, Jan 15 '17 at 09:53
@TerryJanReedy, Unfortunately I am forced to use Python 2.7. It doesn't depend on me. — Andrey Bushman, Jan 15 '17 at 10:51
@TerryJanReedy, you can see in my output that some Cyrillic text was written without problem, but some other Cyrillic text was written with problems. — Andrey Bushman, Jan 15 '17 at 10:54
@AndreyBushman: replace your text with only ASCII characters, and you still have the same problem. It's about the difference in the syntax of print only. — RemcoGerlich, Jan 15 '17 at 10:55

Cyrillic chars in Python 2.7

5 Answers5