Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files?

Question

The documentation says: Hudi works with Spark-2.x & Spark 3.x versions. (https://hudi.apache.org/docs/quick-start-guide.html) But I have not been able to use hudi-spark-bundle_2.11 version 0.7.0 with Spark 2.3.0 and Scala 2.11.12. Is there any specific spark_avro package one has to use?

The job fails with the below error: java.lang.NoSuchMethodError: org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I Any inputs will be very helpful.

In the cluster I am working with we have Spark 2.3.0 and there is no immediate upgrade planned. Wanted to check if there is there any way to make Hudi 0.7.0 work with Spark 2.3.0?

Note: I am able to use Spark 2.3.0 with hudi-spark-bundle-0.5.0-incubating.jar

In spark-shell I am getting the below error:

scala> transformedDF.write.format("org.apache.hudi").
         |         options(getQuickstartWriteConfigs).
         |         option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col1").
         |         //option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col2").
         |         option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col3,col4,col5").
         |         option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
         |         option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.ComplexKeyGenerator").
         |         option("hoodie.upsert.shuffle.parallelism","20").
         |         option("hoodie.insert.shuffle.parallelism","20").
         |         option(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES, 128 * 1024 * 1024).
         |         option(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES, 128 * 1024 * 1024).
         |         option(HoodieWriteConfig.TABLE_NAME, "targetTableHudi").
         |         mode(SaveMode.Append).
         |         save(targetPath)
    21/02/22 07:14:03 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
    java.lang.NoSuchMethodError: org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:156)
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
      at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:52)
      at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:139)
      at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
      at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
      at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
      at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
      at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
      at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
      ... 62 elided

score 0 · Answer 1 · answered Feb 23 '21 at 03:16

0

would you please open a github issue (https://github.com/apache/hudi/issues) so the community would response to you timely?

answered Feb 23 '21 at 03:16

sf lee

1

Thank you for the suggestion. I raised a issue: https://github.com/apache/hudi/issues/2592 to track this to closure – Joyan Feb 23 '21 at 04:16

Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files?

1 Answers1