3

I have a file "LMD.rh.arff" which I am trying to convert to .csv file using the following code-

import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import arff


# Read in .arff file-
data = arff.loadarff("LMD.rh.arff")

But this last line of code gives me the error-

--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) in ----> 1 data = arff.loadarff("LMD.rp.arff")

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py in loadarff(f) 539 ofile = open(f, 'rt') 540 try: --> 541 return _loadarff(ofile) 542 finally: 543 if ofile is not f: # only close what we opened

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py in _loadarff(ofile) 627 a = generator(ofile) 628 # No error should happen here: it is a bug otherwise --> 629 data = np.fromiter(a, descr) 630 return data, meta 631

UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 4: ordinal not in range(128)

In [6]: data = arff.loadarff("LMD.rh.arff")

--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) in ----> 1 data = arff.loadarff("LMD.rh.arff")

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py in loadarff(f) 539 ofile = open(f, 'rt') 540 try: --> 541 return _loadarff(ofile) 542 finally: 543 if ofile is not f: # only close what we opened

~/.local/lib/python3.6/site-packages/scipy/io/arff/arffread.py in _loadarff(ofile) 627 a = generator(ofile) 628 # No error should happen here: it is a bug otherwise --> 629 data = np.fromiter(a, descr) 630 return data, meta 631

UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 4: ordinal not in range(128)

You can download the file arff_file

Any ideas as to what's going wrong?

Thanks!

Arun
  • 2,222
  • 7
  • 43
  • 78

2 Answers2

3

Try this

path_to_directory="./"
files = [arff for arff in os.listdir(path_to_directory) if arff.endswith(".arff")]

def toCsv(content): 
    data = False
    header = ""
    newContent = []
    for line in content:
        if not data:
            if "@attribute" in line:
                attri = line.split()
                columnName = attri[attri.index("@attribute")+1]
                header = header + columnName + ","
            elif "@data" in line:
                data = True
                header = header[:-1]
                header += '\n'
                newContent.append(header)
        else:
            newContent.append(line)
    return newContent

# Main loop for reading and writing files
for zzzz,file in enumerate(files):
    with open(path_to_directory+file , "r") as inFile:
        content = inFile.readlines()
        name,ext = os.path.splitext(inFile.name)
        new = toCsv(content)
        with open(name+".csv", "w") as outFile:
            outFile.writelines(new)
Shubham Mishra
  • 834
  • 6
  • 8
0

Take a look at the error trace

UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 4: ordinal not in range(128)

Your error suggests you have some encoding problem with the file. Consider first opening the file with the correct encoding and then loading it to the arff loader

import codecs
import arff

file_ = codecs.load('LMD.rh.arff', 'rb', 'utf-8') # or whatever encoding you have 
arff.load(file_) # now this should be fine

For reference see here

Saif Asif
  • 5,516
  • 3
  • 31
  • 48
  • when I try the line "codecs.load()", it says: AttributeError: module 'codecs' has no attribute 'load' – Arun Apr 12 '19 at 14:03
  • I tried the following code- f = codecs.open("LMD.rh.arff", "r", "utf-8") data = arff.loadarff(f) However, the same error is generated – Arun Apr 12 '19 at 14:27