If so, what's the reason? I guess to keep it balanced? But is that so critical for Merkle trees?
Asked
Active
Viewed 1,005 times
1 Answers
3
It is not required, but it is less efficient if unbalanced. There are a couple of issues that could arise.
If your assumed range of the Merkle tree was too large so you e.g. had a tree
a / \ b c \ / d e
then you are sending more hashes than you need (b and c are redundant).
Alternatively, if you got the range wrong so an end bucket contained a larger proportion of your key range, you would end up with a tree like this:
a \ b /\ c d \ /\ e f g
Here, there are many more keys hashed to create g than the other buckets so it is more likely to be different. Fixing the inconsistency will involve copying much more data than the other buckets.

Richard
- 11,050
- 2
- 46
- 33
-
Thanks. That makes sense. I am reading Cassandra's source code of MerckleTree. It says, "A MerkleTree is a full binary tree that represents a perfect binary tree of depth 'hashdepth'." It does not explain why it is implemented as a perfect tree. I am guessing it's just a simple way to achieve balanceness. (Then it is safe to assume a more generic, n-ary balanced tree would achieve the same.) – neurite Nov 15 '13 at 19:49
-
In Cassandra you can always do this since you know the token range in advance and can safely assume the hash function is pretty even. – Richard Nov 16 '13 at 12:01