1

I have file each line 1 word,notepad++ shows ANSI encoding.

file looks like like(russian text):

вымышлять

тем|тема|то|тот

не

мало|менее|меней

output something like:

ЇхфюЁ|ЇхфюЁр
ьшїрщыютшў
фюёЄюхтёъшщ
чряшёър
шч
яюфяюы№х
яєсышўэ√щ
¤ыхъЄЁюээ√щ
сшсышюЄхър

my code:

import sys
print sys.stdout.encoding  #prints cp866

ins = open( "out.txt", "r" )
words = []
s=0
for line in ins:
    if (s<10):
        print line 
            s=s+1
    words.append( line )
ins.close()

but it prints wrong words.

beroe
  • 11,784
  • 5
  • 34
  • 79
mrgloom
  • 20,061
  • 36
  • 171
  • 301

2 Answers2

2

I'm assuming Windows.

ANSI on Russian Windows is Windows-1251, but the cmd.exe console window seems to use cp866. Using the codecs module, the file can be read in and translated to Unicode using one encoding, and then printing uses the console's encoding:

import codecs

with codecs.open('out.txt',encoding='cp1251') as ins:
    words = []
    s=0
    for line in ins:
        if (s<10):
            print line, 
            s=s+1
        words.append(line)

Input file (saved in Windows-1251 via Notepad++):

федор|федора
михайлович
достоевский
записка
из
подполье
публичный
электронный
библиотека

Here is the output to my console window configured for cp866 via the chcp 866 command since it is not my default:

федор|федора
михайлович
достоевский
записка
из
подполье
публичный
электронный
библиотека

Note, though, that the two encodings do not support the same characters. The following characters are in Windows-1252, but not in cp866, and will cause Unicode encoding errors if printed to the console in cp866.

ЅІЃЂЌЏЉЈЋЊ—ґҐ–”“„’‘‚§…‡†•\xad€®©«µ¶±‰‹»›ѕ¦ѓђќџљјћњ™і¬
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
-1

The code has to be print(line)

not print line..

sounds silly but its how it works haha

Ceri Westcott
  • 59
  • 1
  • 8