Multiple sources found exception after spark 3.3.0 upgrade

Question

I'm trying to read a csv, perform some transformations and save the result as parquet. It used to work perfectly with spark 3.1.3 version. Recently, I upgraded to 3.3.0 and started getting the below error:

Multiple sources found for csv (org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name

Also, similar error with parquet. I tried to specify the format explicitly but still getting the same error.

spark.read.format("org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2").option("header","true").option("delimiter",CommaDelimiter).csv("")

I see that there are two class names under the datasources, datasources.v2.csv/parquet Maven: org.apache.spark:spark-sql_2.12:3.3.0. How to remove one of them to avoid this error or is there any alternative?

Could you post your `pom.xml`? You can exclude a transitional dependency using `.........`, [link](https://stackoverflow.com/questions/9119055/how-to-exclude-maven-dependencies) — WoAiNii, Jul 09 '23 at 18:46

score 0 · Answer 1 · answered Jul 10 '23 at 15:01

You can use the purge-local-repository maven goal, It'll remove the locally installed dependencies of your project from your cache.

If you have some dependecies that you want to remove, you can use the maven goal this way

mvn dependency:purge-local-repository -DmanualInclude="groupId:artifactId, ..."

It Is also possible to perform a manual delete on the .m2/repository/dependencyname/version/****

Multiple sources found exception after spark 3.3.0 upgrade

1 Answers1