ClickHouse log shows hash of uncompressed files doesn't match

Question

ClickHouse logs printed the error messages as below frequently:

2021.01.07 00:55:24.112567 [ 6418 ] {} <Error> vms.analysis_data (7056dab3-3677-455b-a07a-4d16904479b4): 
Code: 40, e.displayText() = DB::Exception: Checksums of parts don't match: 
hash of uncompressed files doesn't match (version 20.11.4.13 (official build)). 
Data after merge is not byte-identical to data on another replicas. There could be several reasons: 
1. Using newer version of compression library after server update. 
2. Using another compression method. 
3. Non-deterministic compression algorithm (highly unlikely). 
4. Non-deterministic merge algorithm due to logical error in code. 
5. Data corruption in memory due to bug in code. 
6. Data corruption in memory due to hardware issue. 
7. Manual modification of source data after server startup. 
8. Manual modification of checksums stored in ZooKeeper. 
9. Part format related settings like 'enable_mixed_granularity_parts' are different on different replicas. 
We will download merged part from replica to force byte-identical result.

We use the same version(20.11.4.13) and the same compression method (LZ4) for all data nodes in the production environment, we wouldn't modify the data files or the values stored in Zookeeper also.

So my questions are:

How was the error caused? Furtherly, in which cases will the CickHouse server throws those exceptions?
Is there a checksum-checking mechanism among the replicas during the merging parts?
I also found that in one of our data nodes, there are many folders named like "ignored_20201208_23116_23116_0" in the detached folder, were these files the corrupted data caused by the referred problem?

Thanks.

score 0 · Accepted Answer · answered Jan 08 '21 at 04:19

You need to upgrade all nodes to 20.11.6.6 ASAP.

The reason of these errors is a serious bug related to AIO.

ignored_ -- it's not related. You can remove them.

gtranslate: Inactive parts are not deleted immediately, because when writing a new part, fsync is not called, i.e. for some time, the new part is only in the server's RAM (OS cache). So when the server is rebooted spontaneously, a new (merged) part can be lost or damaged. In this case, ClickHouse, during the startup process is checking the integrity of the parts, if it detects a problem, it returns the inactive chunks to the active list, and later merge them again. In this case, the broken piece is renamed (the prefix broken_ is added) and moved to the detached folder. If the integrity check detects no problems in the merged part, then the original inactive chunks are renamed (prefix ignored_ is added) and moved to the detached folder.

With ReplicatedMergeTree, when one of the replica is preparing to merge new inserted parts, how to maintain the consistency of parts? To notify the other replicas to do the same merging action individually by Zookeeper? Thanks a lot. — bluelullaby, Jan 08 '21 at 05:48
Multiple replicas can be leaders at the same time. A replica can be prevented from becoming a leader using the merge_tree setting replicated_can_become_leader. The leaders are responsible for scheduling background merges. — Denny Crane, Jan 08 '21 at 14:28

ClickHouse log shows hash of uncompressed files doesn't match

1 Answers1