print Russian text from file in windows using python

Question

I have file each line 1 word,notepad++ shows ANSI encoding.

file looks like like(russian text):

вымышлять

тем|тема|то|тот

не

мало|менее|меней

output something like:

ЇхфюЁ|ЇхфюЁр
ьшїрщыютшў
фюёЄюхтёъшщ
чряшёър
шч
яюфяюы№х
яєсышўэ√щ
¤ыхъЄЁюээ√щ
сшсышюЄхър

my code:

import sys
print sys.stdout.encoding  #prints cp866

ins = open( "out.txt", "r" )
words = []
s=0
for line in ins:
    if (s<10):
        print line 
            s=s+1
    words.append( line )
ins.close()

but it prints wrong words.

Could you please add an example of what the file looks like and what the output is? Also, it looks like you're not increasing the s variable. — jazzpi, Nov 13 '13 at 06:52
Find out the encoding of the file (which might be different than the encoding of `sys.stdout`) and then use `codecs.open` to open it with the correct encoding instead of `open`. — Bakuriu, Nov 13 '13 at 07:10
Hi, It will help you: http://stackoverflow.com/questions/2668319/how-to-workaround-python-windowserror-messages-are-not-properly-encoded-proble — Anup, Nov 13 '13 at 07:16

Mark Tolonen · Accepted Answer · 2013-11-13T07:30:21.140

I'm assuming Windows.

ANSI on Russian Windows is Windows-1251, but the cmd.exe console window seems to use cp866. Using the codecs module, the file can be read in and translated to Unicode using one encoding, and then printing uses the console's encoding:

import codecs

with codecs.open('out.txt',encoding='cp1251') as ins:
    words = []
    s=0
    for line in ins:
        if (s<10):
            print line, 
            s=s+1
        words.append(line)

Input file (saved in Windows-1251 via Notepad++):

федор|федора
михайлович
достоевский
записка
из
подполье
публичный
электронный
библиотека

Here is the output to my console window configured for cp866 via the chcp 866 command since it is not my default:

федор|федора
михайлович
достоевский
записка
из
подполье
публичный
электронный
библиотека

Note, though, that the two encodings do not support the same characters. The following characters are in Windows-1252, but not in cp866, and will cause Unicode encoding errors if printed to the console in cp866.

ЅІЃЂЌЏЉЈЋЊ—ґҐ–”“„’‘‚§…‡†•\xad€®©«µ¶±‰‹»›ѕ¦ѓђќџљјћњ™і¬

score -1 · Answer 2 · answered Nov 13 '13 at 06:59

-1

The code has to be print(line)

not print line..

sounds silly but its how it works haha

answered Nov 13 '13 at 06:59

Ceri Westcott

59
1
8

Not if the OP is using Python 2.X. – Mark Tolonen Nov 13 '13 at 07:01

print Russian text from file in windows using python

2 Answers2