0

If so, what's the reason? I guess to keep it balanced? But is that so critical for Merkle trees?

neurite
  • 2,798
  • 20
  • 32

1 Answers1

3

It is not required, but it is less efficient if unbalanced. There are a couple of issues that could arise.

If your assumed range of the Merkle tree was too large so you e.g. had a tree

 a
/ \
b c
\ /
d e

then you are sending more hashes than you need (b and c are redundant).

Alternatively, if you got the range wrong so an end bucket contained a larger proportion of your key range, you would end up with a tree like this:

a
 \
 b
 /\
c  d
\  /\
 e f g

Here, there are many more keys hashed to create g than the other buckets so it is more likely to be different. Fixing the inconsistency will involve copying much more data than the other buckets.

Richard
  • 11,050
  • 2
  • 46
  • 33
  • Thanks. That makes sense. I am reading Cassandra's source code of MerckleTree. It says, "A MerkleTree is a full binary tree that represents a perfect binary tree of depth 'hashdepth'." It does not explain why it is implemented as a perfect tree. I am guessing it's just a simple way to achieve balanceness. (Then it is safe to assume a more generic, n-ary balanced tree would achieve the same.) – neurite Nov 15 '13 at 19:49
  • In Cassandra you can always do this since you know the token range in advance and can safely assume the hash function is pretty even. – Richard Nov 16 '13 at 12:01