0

After I downloaded the dataset as iris.data, I renamed it to iris.data.txt. I was trying to circumvent this reported error on SO:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 8: invalid continuation byte

After reading up, I tried this:

dataset = pd.read_csv('iris.data.txt', header=None, names=names,encoding="ISO-8859-1")

This partly solved the error but some rows were still garbage.

Then I tried to open it with Sublime, save it with utf-8 encoding and then dataset = pd.read_csv('iris.data.txt', header=None, names=names,encoding="utf-8")

But this doesn't solve the problem either. I'm running Python 3 on Mac OS. What could possibly render the data readable directly?

[EDIT]: The datatype reads: Web archive. In Spyder, the file appears as iris.data.webarchive

If I try dataset = pd.read_csv('iris.data.webarchive', header=None), it gives this traceback:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 5

If I try dataset = pd.read_csv('iris.data', header=None), it gives FileNotFoundError: File b'iris.data' does not exist

DPeterK
  • 408
  • 3
  • 12
srkdb
  • 775
  • 3
  • 15
  • 28

1 Answers1

0

I figured out my rookie mistake. I had to save the page as 'source' instead of 'webarchive' (which is the default Mac setting)

srkdb
  • 775
  • 3
  • 15
  • 28