Special Charecter in Header line during text import

Question

I'm trying to write a python script to import a data file generated by data aquistion software (EC-lab). I would like to keep the column headers as they are in the file and not manually define them since they are not uniform across all files (different techniques will generate data in different orders and will have a different number of headers). The problem is that the header text in the file contains forward slashes (eg "ox/red", "time/s").

I am getting an ascii error when I try to load the data with the header column

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 19: ordinal not in range(128)

I've tried adding encoding as a keyword argument based off other solutions but that didn't yield a solution

 data = np.genfromtxt("20180611_bB_GCE-G.mpt", dtype=None, delimiter='\t', names=True, skip_header=61, encoding='utf-8')

I'm currently using genfromtxt as the data import technique

data = np.genfromtxt("filename.mpt", dtype=None, delimiter='\t', names=True, skip_header=61)

That error doesn't seem to have anything to do with forward slashes. — abarnert, Jun 19 '18 at 21:48
Does adding `encoding='utf-8'` really not change the error message, or does it mean you get a _different_ `UnicodeDecodeError`? — abarnert, Jun 19 '18 at 21:50
Header names are used to create structured array field names (or even `recarray` attributes). So they are handled differently than string data (which `encoding` now handles). Some parameters have to do with names. There have been SO questions about funny header names. — hpaulj, Jun 19 '18 at 22:41

score 1 · Accepted Answer · answered Jun 19 '18 at 21:55

First, forward slashes in headers are not a problem for ASCII, for CSV files, or for NumPy.

My guess is that the real problem is that your CSV is in Latin-1, or a Latin-1-compatible encoding like Windows-1252, and that one of the headers includes the micro sign µ, which is 0xB5 in those encodings. Or that the headers aren't actually a problem at all, and you have µ characters in some of the data.

Either way, with the default encoding of ASCII, you get an error about 0xb5 not being in range(128), exactly like the one in your question.

If you try to fix this by explicitly specifying encoding='utf-8', that's the wrong encoding, and you just get a different error, about 0xb5 being an invalid start byte.

If you fix it by specifying encoding='latin-1', it should work.

More generally, you have to know what encoding your files are actually in, not just guess wildly. Especially if you're on Windows, where a lot of files are going to be in whatever encoding you have set as your OEM code page, while others will be in UTF-16-LE, while others will be in UTF-8 but with an illegal BOM, etc.

The program that generated them should document what encoding it uses, or have options to let you pick. If it doesn't, you need to try, e.g., viewing the file in a text editor that lets you select the encoding to try to figure out which one looks correct. Or you can use a tool like chardet to help you guess.

Thanks for the info on chardet. It returned that the file is in ISO-8859-1 (which is the same as latin-1). My SO and other searching on the issue had yielded the utf-8 suggestion. I'm still not able to get `encoding='latin-1'` to resolve the issue as i'm now geting `TypeError: genfromtxt() got an unexpected keyword argument 'encoding'` — Sophie Lee, Jun 20 '18 at 15:00
@SophieLee Your question claims that you used `encoding='utf-8'` and got a `UnicodeDecodeError`. If that's true, then changing it to `encoding='latin-1'` would fix the error, or give you a different `UnicodeDecodeError`; there's no way it could cause this `TypeError`. Is it possible you're running on a bunch of different machines, and some of them have NumPy 1.14.0 or later while others have earlier versions? Because, as [the docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html) show, `encoding` was added in 1.14. — abarnert, Jun 20 '18 at 23:53
@SophieLee If you need your code to work on older versions, there are workarounds, but they're more complicated than just upgrading numpy. — abarnert, Jun 20 '18 at 23:55
You got me. I'm using an Anaconda package with numpy installed and assumed it was up to date but just ran the upgrade and went from 1.13.3 to 1.14.5 so that was the incorrect assumption. Problem solved! This is my first time writing my own python from scratch rather than working with a colleagues notebook. — Sophie Lee, Jun 21 '18 at 13:44

Special Charecter in Header line during text import

1 Answers1