Apache Spark Multiple sources found for csv Error

Question

I'm trying to run my spark program using the spark-submit command (i'm working with scala), i specified the master adress, the class name, the jar file with all dependencies, the input file and then the output file but i'm having and error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.;

Here is a screenshot for this error, What is it about? How can i fix it?

Thank you

How did you run your job, can you share the dependencies or pom.xml file too ? — koiralo, Jan 03 '21 at 10:20
Is that you are running fat jar file ? Also mention that are you running this in windows or linux environment . So if it is yes it will be like this `./spark-submit your-fat-jarfile,jar` . Also check whether your folder has the the appropriate permissions for file write or read. — Kaviranga, Jan 03 '21 at 10:31
Yes i'm in the right folder And yes i mentionned the jar file in the spark-submit command — amelie, Jan 03 '21 at 10:39
check the list of jars you might have different versions of spark-csv jars in classpath — koiralo, Jan 03 '21 at 10:44
No it is only one jar file with all dependencies: target/sample-1.0-SNAPSHOT-jar-with-dependencies.jar I created it using the mvn package commande I think it is a version problem too — amelie, Jan 03 '21 at 10:46
See [this question](https://stackoverflow.com/questions/50884599/apache-spark-2-0-pyspark-dataframe-error-multiple-sources-found-for-csv). It's likely that you have multiple versions of Spark in the class path. — mck, Jan 03 '21 at 11:02
Also try this solution [DataFrame Error Multiple sources found for csv](https://stackoverflow.com/questions/50884599/apache-spark-2-0-pyspark-dataframe-error-multiple-sources-found-for-csv) . This will be helpful — Kaviranga, Jan 03 '21 at 11:10
which Spark version do oyuuse? Check the dependencies with `mvn dependency:tree` - as already mentioned, you have some dependency issue. Either you import another Spark lib that does come with its own CSV DataSource or you have multiple Spark libs - which would be weird. Also, in the Fat Jar, set the dependency scope of all Spark libs to `provided` - obviously, you don't have to put those into the Fat-jar given that your Spark cluster setup has all of them already — UninformedUser, Jan 03 '21 at 11:15

Kaviranga · Answer 1 · 2021-01-03T11:02:20.153

Here you got some warnings also,

If you correctly run your fat-jar file with correct permissions you can get a output like this for ./spark-submit

Check whether if correctly set environmental variables for spark (~/.bashrc). Also check the source CSV file permissions. May be it will be the problem.

If you are running on linux environment set the folder permissions for the source CSV folder as

sudo chmod -R 777 /source_folder

After that again try to run ./spark-submit with your fat-jar file.

Apache Spark Multiple sources found for csv Error

1 Answers1