0

probably a silly issue, but I don't get it. I'm working on a Jupyter Notebook with Python3.6, Spark 2.4, hosted by IBM Watson Studio.

I have a simple csv file:

num,label
0,0
1,0
2,0
3,0

And to read it I use the following commands:

labels = spark.read.csv(url, sep=',', header=True)

But if I check if labels is correct, using labels.head(), I get Row(PAR1Љ��L�Q�� ='\x08\x00]')

What am I missing?

Vincenzo Lavorini
  • 1,884
  • 2
  • 15
  • 26

1 Answers1

1

This looks like due to an encoding issue

Try this with an encoding provided in the option,alo try with UTF-8

labels = spark.read.csv(url, sep=',', header=True).option("encoding", "ISO-8859-1")
dsk
  • 1,863
  • 2
  • 10
  • 13
  • Indeed the ISO-8859-1 encoding did the job. However, stated like that do not work. I ran `labels = spark.read.csv(url, sep=',', header=True, encoding="ISO-8859-1")` – Vincenzo Lavorini Jul 03 '20 at 07:59