I am trying incremental write to a Hudi table with Hive sync enabled, but it is failing with following error:
23/07/24 11:52:48 INFO org.apache.hudi.hive.HiveSyncTool: Schema difference found for table1
23/07/24 11:52:48 INFO org.apache.hudi.hive.ddl.HMSDDLExecutor: partition table,need cascade
Traceback (most recent call last):
java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.IMetaStoreClient.alter_table_with_environmentContext(Ljava/lang/String;Ljava/lang/String;Lorg/apache/hadoop/hive/metastore/api/Table;Lorg/apache/hadoop/hive/metastore/api/EnvironmentContext;)V
I have examined logs and found there is some additional filter getting applied to IncrementalRelation. I don't know what this exactly means.
INFO org.apache.hudi.IncrementalRelation: Additional Filters to be applied to incremental source are :[Ljava.lang.String;@5ec488f6
I have also observed that incremental data is reflected in the underlying storage location. Because when I try to read that data using Pyspark console, I am getting the updated versions of the records.
Further observation is, If I again try to do the same above operation it is failing with a different error, which is as below:
java.io.InvalidClassException: com.fasterxml.jackson.core.io.SerializedString; local class incompatible: stream classdesc serialVersionUID = 4312806453773505982, local class serialVersionUID = 1
The whole thing is very confusing for me. Any help wold be appreciated here. Thanks in advance.
Spark Version: 2.4.8 Scala: 2.12 Hudi and other package versions: org.apache.hudi:hudi-spark-bundle_2.12:0.10.1,org.apache.spark:spark-avro_2.12:2.4.8,com.fasterxml.jackson.core:jackson-core:2.6.7,com.fasterxml.jackson.core:jackson-databind:2.6.7.3
Hudi Options used while writing:
"hoodie.table.name": "table1",
"hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.recordkey.field": "col1,col2",
"hoodie.datasource.write.precombine.field": "col3",
"hoodie.datasource.write.partitionpath.field": "col4",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.datasource.write.table.name": "table1",
"hoodie.upsert.shuffle.parallelism": 1,
"hoodie.insert.shuffle.parallelism": 1,
"hoodie.consistency.check.enabled": True,
"hoodie.index.type": "BLOOM",
"hoodie.index.bloom.num_entries": 60000,
"hoodie.index.bloom.fpp": 0.000000001,
"hoodie.cleaner.commits.retained": 2,
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.mode": "hms",
"hoodie.datasource.hive_sync.metastore.uris": "metastore_uris",
"hoodie.datasource.hive_sync.database": "db",
"hoodie.datasource.hive_sync.table": "table1_cow",
I was facing same java.io.InvalidClassException: com.fasterxml.jackson.core.io.SerializedString
error when I first tried the incremental write. I resolved this error by providing explicit Jackson libraries which I have mentioned above.
I have also made sure that column names and order of columns is exactly same for Hive table and incremental DF. Only difference is the Hudi metadata columns like commit_time, commit_seqno, record_key, partition_path.
If I disable the Hive Sync, then incremental write are going all fine.