1

I'm trying to read a csv, perform some transformations and save the result as parquet. It used to work perfectly with spark 3.1.3 version. Recently, I upgraded to 3.3.0 and started getting the below error:

Multiple sources found for csv (org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name

Also, similar error with parquet. I tried to specify the format explicitly but still getting the same error.

spark.read.format("org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2").option("header","true").option("delimiter",CommaDelimiter).csv("")

I see that there are two class names under the datasources, datasources.v2.csv/parquet Maven: org.apache.spark:spark-sql_2.12:3.3.0. How to remove one of them to avoid this error or is there any alternative? enter image description here

praneethh
  • 263
  • 4
  • 16
  • Could you post your `pom.xml`? You can exclude a transitional dependency using `.........`, [link](https://stackoverflow.com/questions/9119055/how-to-exclude-maven-dependencies) – WoAiNii Jul 09 '23 at 18:46

1 Answers1

0

You can use the purge-local-repository maven goal, It'll remove the locally installed dependencies of your project from your cache.

If you have some dependecies that you want to remove, you can use the maven goal this way

mvn dependency:purge-local-repository -DmanualInclude="groupId:artifactId, ..."

It Is also possible to perform a manual delete on the .m2/repository/dependencyname/version/****

shalnarkftw
  • 402
  • 2
  • 8