0

I am upgrading to spark 2 from 1.6 and am having an issue reading in CSV files. In spark 1.6 I would have something like this to read in a CSV file.

val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.load(fileName)

Now I use the following code as given in the documentation:

val df = spark.read
.option("header", "true")
.csv(fileName)

This results in the following error when running:

"Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name."

I assume this is because I still had the spark-csv dependency, however I removed that dependency and rebuilt the application and I still get the same error. How is the databricks dependency still being found once I have removed it?

st33l3rf4n
  • 11
  • 2
  • 5

2 Answers2

3

The error message means you have --packages com.databricks:spark-csv_2.11:1.5.0 option while you run spark-shell or have those jars in your class path. Please check your class path and remove that.

-1

I didn't add any jars to my class path. I use this to load csv file in spark shell(2.3.1). val df = spark.sqlContext.read.csv('path')

Inasa Xia
  • 433
  • 5
  • 12