@[toc]
0 Reason guess
Every time we upsert the target, hoodie generates a log and compacts it, causing any incremental query before that point in time to die.
1
Here are all the operations to do with the original label.
1.1 Operation 1 (Update)
An upsert operation (updating 370 data) was first performed on the original table, followed by an incremental query, and the result was successful
Using hadoop command to query HDFS file, log log appears, data is written in the log file, but not in parquet:
compaction did not occur when a compaction operation was executed.
1.2 Operation 2 (Insert and Update)
Six pieces of data are inserted into the original table data and 380 data are updated. The HDFS data queried using Spark changes as follows:
Data 370 is repeatedly committed. The following error occurs when incremental query is performed using Spark:
21/07/30 14:25:45 ERROR executor.Executor: Exception in task 0.0 in stage 2.0 (TID 4)
java.io.FileNotFoundException: File does not exist: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_increment_hudi9_mor/2021/07/30/4fe43850-4be8-447f-827e-edfdba44adb4-0_0-34 0-294_20210730142459.parquet
Using hadoop command to query the parquet of '20210730142459' of instantTime, we found that the actual writeToken was' 341-295 ', while the queried writeToken was' 340-294 ', indicating that the file was written again. As a result, writeToken is changed, causing Spark to increments an invalid or nonexistent Parquet file.
We found that a datagram instantTime '20210730142459' was compacted once, resulting in a rewrite of the datagram and a writeToken change.
2 Exclude other factors
We have a new insert operation that generates no log file, and we have not executed a compaction operation. However, as soon as a pair of original tables are upsert, logs are generated and compressed.
3 Solution
Previous attempts against the wrong data,
Query the status of campAction:
Run the "compaction repair --instant 20210730112532" command on the HUDi-cli. sh server to repair the compaction.
3.2 Solution 2
Adjust Hudi compaction operation, such as hoodie.com paction. Strategy
. No useful adjustment strategy has yet been found.
An attempt has been made, but it does not help that upsert causes incremental queries to fail.
I hope the everyone can point out the right solution, thank you.