I am encountering an error while attempting to read data from a Hudi table incrementally using Spark-shell. Below is the code I am using:
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.HoodieDataSourceHelpers
import org.apache.hadoop.fs.{FileSystem, Path}
val conf = spark.sparkContext.hadoopConfiguration
val fs = FileSystem.get(conf)
val beginTime = "20230614155000"
val endTime = "20230615103000"
val srcPath = "/user/hdfs/test/testT/"
val incViewDF = spark.read.format("org.apache.hudi")
.option("hoodie.datasource.query.type", "incremental")
.option("hoodie.datasource.read.begin.instanttime", beginTime)
.option("hoodie.datasource.read.end.instanttime", endTime)
.load(srcPath)
However, I am encountering the following error message:
java.lang.NoSuchFieldError: NULL_VALUE
at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:246)
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:217)
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:145)
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaWithoutMetadataFields(TableSchemaResolver.java:180)
at org.apache.hudi.IncrementalRelation.<init>(IncrementalRelation.scala:89)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:95)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 50 elided
Spark-shell Command:
spark-shell --jars hudi-spark-bundle_2.11-0.6.0.jar,parquet-avro-1.10.0.jar,avro-1.10.2.jar \
--conf spark.sql.hive.convertMetastoreParquet=false \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
Environment Details:
Spark version: 2.2.0
Scala version: 2.11.0
Hive version: 1.2.1000.2.6.3.0-235
Additional Information:
I have already included the necessary JAR files (hudi-spark-bundle_2.11-0.6.0.jar, parquet-avro-1.10.0.jar, avro-1.10.2.jar) while launching Spark-shell. I would appreciate any assistance in resolving this issue.