python 3 japanese letters from genfromtxt

Question

I'm working on a programm that uses data from .txt-files and does stuff with said data. The data contains mostly latin characters, but sometimes there are also Japanese characters. That's what I want to do:

# -- coding: UTF-8 --
import numpy as np
test=open("test.txt", "r")
test2=open("list.txt", "w")

test2.write("# ")
for line in test:
    line2=line.replace('""', "(None)")
    line3=line2.replace('"', "")
    line4=line3.replace(" ", "_")
    line5=line4.replace(",", " ")
    test2.write(line5)

It works perfectyl fine but there are some Japanese characters that cause problems. Funny thing is, characters like ゲ, ノ, セ, ト or ク are no big deal, but these characters are:いがか.
As soon as one of them hides somewhere in test.txt, the follwoing error-message occurs:

UnicodeDecodeError                        Traceback (most recent call last) C:\Users\syhon\Documents\DV-Liste\ListeV2.0\ListeV2.py in <module>()
    196
    197 test2.write("# ")
--> 198 for line in test:
    199     line2=line.replace('""', "(None)")
    200     line3=line2.replace('"', "")

C:\Users\syhon\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6281: character maps to <undefined>

However, I was able to find out that I can print said characters without problem in python 2, but not in python 3. So, is it possible to get these characters decoded in python 3?

score 0 · Accepted Answer · answered Apr 02 '18 at 20:20

0

How is test.txt encoded? I suspect it is encoded using utf-8. If so, try this in Python3:

test=open("test.txt", "r", encoding="utf-8")

answered Apr 02 '18 at 20:20

Robᵩ

163,533
20
239
308

Accoording to Atom, test.txt uses utf8-encoding. Adding "encoding="utf-8"" in both, test and test2, solved the problem, thanks alot! – LarsK Apr 02 '18 at 20:26

python 3 japanese letters from genfromtxt

1 Answers1