3

I am trying to learn spark and scala, on my trying to write the dataframe object of my result to parquet file by calling the parquet method, i am getting error as such

Code Base that fails:-

df2.write.mode(SaveMode.Overwrite).parquet(outputPath)

This fails too

df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).parquet(outputPath)

Error Log:-

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:707)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:967)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:304)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:848)

How ever if I called another method for the save, the code works properly,

This works fine:-

df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).save(outputPath)

Although I have a solution for the issue, i'd like to understand why the first approach is not working and how I can solve it.

The details of the specification i am using are:- Scala 2.12.9 Java 1.8 Spark 2.4.4

P.S. This issue is only seen on spark-submit

thickGlass
  • 540
  • 1
  • 5
  • 19

0 Answers0