I am trying to import a data file using np.genfromtxt
The data file contains a large commented header, each line beginning with the comment character *
When using the comments='*' kw in genfromtxt, I am still raising an error due to a weird encoding. After some googling the encoding is CP932, japanese characters.
An example of this is: b'*HW_ATTACHMENT_NAME "\x95W\x8f\x80|Standard"\r\n' This can be decoded with _.decode('cp932') to '*HW_ATTACHMENT_NAME "標準|Standard"\r\n'
However, genfromtxt does not recognize cp932 as an encoding (passing encoding='cp932') and still raises a UnicodeDecodeError.
So, is there a way to force genfromtxt to not read these characters? If not, is there a way to remove all cp932 encoded characters?
Something like this
with open(file) as f:
#some code to remove cp932 encoded text
data = np.genfromtxt(f, comments='*', dtype='float')
edit: This does not work, perhaps an incorrect way to go about it.
with open(file) as f:
lines = (line.decode('cp932') for line in f)
data = np.genfromtxt(lines, comments='*', dtype='float')