Spark 2.0 CSV Error

Question

I am upgrading to spark 2 from 1.6 and am having an issue reading in CSV files. In spark 1.6 I would have something like this to read in a CSV file.

val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.load(fileName)

Now I use the following code as given in the documentation:

val df = spark.read
.option("header", "true")
.csv(fileName)

This results in the following error when running:

"Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name."

I assume this is because I still had the spark-csv dependency, however I removed that dependency and rebuilt the application and I still get the same error. How is the databricks dependency still being found once I have removed it?

score 3 · Answer 1 · answered Jan 18 '17 at 18:51

3

The error message means you have --packages com.databricks:spark-csv_2.11:1.5.0 option while you run spark-shell or have those jars in your class path. Please check your class path and remove that.

answered Jan 18 '17 at 18:51

Dongjoon Hyun

64
3

Inasa Xia · Answer 2 · 2018-12-05T09:15:41.173

-1

I didn't add any jars to my class path. I use this to load csv file in spark shell(2.3.1). val df = spark.sqlContext.read.csv('path')

edited Dec 05 '18 at 09:15

answered Dec 04 '18 at 07:50

Inasa Xia

433
5
12

4

Could you [edit] your answer and add short explanation why/how this code solves issue? – barbsan Dec 04 '18 at 08:18

Spark 2.0 CSV Error

2 Answers2

Linked