0

Any ideas on best way to get arff.loadarff to work from a URL? I am trying to read an arff file from the following URL [using Python 3.7]: https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff

I have tried a few methods and the central problem is getting urllib.request to return a file or file-like object so that arff.loadarff can recognize it and read it properly.

Here is some of what I have tried and the results:

from scipy.io import arff
import urllib.request

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff"
response = urllib.request.urlopen(url)
data, meta = arff.loadarff(response)

This gives an error TypeError because urlopen returns a response object.

I also tried to follow the solutions in the accepted answer here:

from scipy.io import arff
import urllib.request
import codecs

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff"
ftpstream = urllib.request.urlopen(url)
data, meta = arff.loadarff(codecs.iterdecode(ftpstream, 'utf-8'))

but this also gives a TypeError because the codecs.iterdecode returns a generator. And this one:

from scipy.io import arff
import urllib.request

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff"
ftpstream = urllib.request.urlopen(url)
data, meta = arff.loadarff(ftpstream.read().decode('utf-8'))

This accesses the file as a string but returns the full arff file as the file name and I get an error that the filename is too long.

Rory Daulton
  • 21,934
  • 6
  • 42
  • 50
rLevv
  • 498
  • 3
  • 12

1 Answers1

3

You're almost there. loadarff() needs a text file-like object which neither urlopen() nor the result of decode() fulfils. So the way to do is to wrap the text string content into a file-like object using io.StringIO():

from scipy.io import arff
import urllib.request
import io # for io.StringIO()

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff"
ftpstream = urllib.request.urlopen(url)
data, meta = arff.loadarff(io.StringIO(ftpstream.read().decode('utf-8')))

A file-like object here means something x that can do x.read() and returns a string, just like the file object returned by open(filename)

adrtam
  • 6,991
  • 2
  • 12
  • 27