Genfromtext Issues with umlauts

Question

I was just working on a program i made for fun and i came across a problem that i was unable to find a solution to. The code I wrote looked something like this:

import numpy as np

data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12"))
print(data)

'list.txt' looked something like this:

# random random2
foo ßaar

When I try to run this code, the following error-message appears:

UnicodeDecodeError Traceback (most recent call last) C:\Users\syhon\Documents\Test\test.py in () 1 import numpy as np 2 ----> 3 data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12")) 4 print(data)

C:\Users\syhon\Anaconda3\lib\site-packages\numpy\lib\npyio.py in >genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, >converters, missing_values, filling_values, usecols, names, excludelist, >deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, >usemask, loose, invalid_raise, max_rows) 1927 dtype = np.dtype(ttype) 1928 # -> 1929 output = np.array(data, dtype) 1930 if usemask: 1931 if dtype.names:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

However, as soon as I remove the ß, the code works just fine. Is there a way to keep the umlauts?

import random · Answer 1 · 2018-03-28T00:59:35.083

Can you try manually specifying the encoding?

>>> import numpy as np
>>> data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12"), encoding='ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "L:\lib\site-packages\numpy\lib\npyio.py", line 1708, in genfromtxt
    first_line = _decode_line(next(fhd), encoding)
  File "L:\\lib\encodings\ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 4: ordinal not in range(128)
>>> data= np.genfromtxt('list.txt', unpack=True, dtype=("U12", "U12"), encoding='bytes')
>>> print(data)
['foo' 'ßaar']

Note: for me bytes was already the default encoding, so I was initially unable to replicate your error.

EDIT: To clarify, I mean adding the encoding keyword argument to the np.genfromtxt() function call. When I initially ran your code, there was no error. I could only reproduce your error when setting the encoding to ascii.

Both file use UTF-8-encoding (hope that's what you mean). If it's important, my Python version is 3.6.1 — LarsK, Mar 27 '18 at 22:42
I did not specify the encoding in my code. However, when I try to do so, the following error-message appears:"TypeError: genfromtxt() got an unexpectetd keyword argument 'encoding'. — LarsK, Mar 28 '18 at 13:15

score 0 · Accepted Answer · answered Mar 30 '18 at 22:45

0

putting

# -*- coding: utf-8 -*-

in the top line seems to solve the Problem

answered Mar 30 '18 at 22:45

LarsK

13
3

Genfromtext Issues with umlauts

2 Answers2