Merkle tree for finding data inconsistencies - optimizing number of queries

Question

I understand the idea behind using Merkle tree to identify inconsistencies in data, as suggested by articles like

Essentially, we use a recursive algorithm to traverse down from root we want to verify, and follow the nodes where stored hash values are different from server (with trusted hash values), all the way to the inconsistent leaf/datablock.

If there's only one such block (leaf) that's corrupted, this means we following a single path down to leaf, which is log(n) queries.

However, in the case of multiple inconsistent data blocks/leaves, we need up to O(n) queries. In the extreme case, all data blocks are corrupted, and our algorithm will need to send every single node to server (authenticator). In the real world this becomes costly due to the network.

So my question is, is there any known improvement to the basic traverse-from-root algorithm? A possible improvement I could think of is to query the level of nodes in the middle. For example, in the tree below, we send the server the two nodes in the second level ('64' and '192'), and for any node that returns inconsistency, we recursively go to the middle level of that sub-tree - something like a binary search based on height.

This increases our best case time from O(1) to O(sqrt(n)), and probably reduces our worst case time to some extent (I have not calculated how much).

I wonder if there's any better approach than this? I've tried to search for relevant articles on Google Scholar, but looks like most of the algorithm-focused papers are concerned with the merkle-tree traversal problem, which is different from the problem above.

Thanks in advance!

Why would it decrease worst case time? In worst case, you are still going to need to traverse all leaves, leading you to `O(n)` nodes. Also, don't forget you are going to need to update the hash values of the upper levels after you update the hash value of the middle nodes. I am not familiar with what you are looking for, but generally, Merkle trees are best when the data is rarely changed, most of the checks will just verify the root node, and be done with it - but when you do need to update a few nodes - that is done quite easily as well. — amit, Jun 30 '20 at 08:47

Merkle tree for finding data inconsistencies - optimizing number of queries

0 Answers0