What is the time complexity of comparing two Merkle trees?

Question

I have a simple recursive function that compares two merkle trees and accumulates the differences in the leaf nodes. However, I am unable to measure the time complexity of it. Specifically, I would like to see how does it compare against comparing two Hashtables or two BSTs. A group of leaves is a row in this case and they share a rowid. A group of rows form the entire merkle tree. In the code below I am just accumulating the differences at the leaf level.

def diff_helper(node1: MNode, node2: MNode, diff: List[Difference]):
    if not node1 and not node2:
        return
    elif node1.rowid==node2.rowid and node1.signature==node2.signature and node1.nodetype==NodeType.Row and node2.nodetype==NodeType.Row:
        return
    elif node1.rowid==node2.rowid and node1.signature!=node2.signature and node1.nodetype==NodeType.Row and node2.nodetype==NodeType.Row:
        diff_helper(node1.left, node2.left, diff)
        diff_helper(node1.right, node2.right, diff)
    elif node1.rowid==node2.rowid and node1.signature!=node2.signature and node1.nodetype==NodeType.Leaf and node2.nodetype==NodeType.Leaf:
        diff.append(Difference(node1.rowid, node1.column, node1.value, node2.value))
    else:
        diff_helper(node1.left, node2.left, diff)
        diff_helper(node1.right, node2.right, diff)

Time complexity:

On the best case, I see that this is a constant operation since the root hashes of both trees would be the same. On the worst case, the number of comparisons is the total number of all leaf nodes.

Question:

I can sense that merkle trees does better than plain hashtables because of the ability to prune the trees much faster. However, I am unable to represent this in Big O terms.

A comparable hashtable implementation would be to do a traversal of the rowids and constant time lookup on the second hashtable. Once you find the values of the rowid, you'll probably do a linear comparison of each leaf if the leaf level data is stored as a hashtable.

score 1 · Accepted Answer · answered Aug 07 '21 at 12:24

This is a good example of an output-sensitive algorithm, whose worst-case running time is most accurately specified by introducing a new quantity. For this problem, if h is the height of the trees and d is the number of leafs diffs, then the worst case is O(d + d (h − log d)). At level ℓ of the tree we can have up to min(2^ℓ, d) nodes with diffs, and we can charge the cost of matching nodes to their parents, so we just have to bound this sum. The crossover point is ℓ = log d, so ∑_{ℓ=0...log d} 2^ℓ is 2^log d + 1 − 1 = Θ(d). Then there are h − log d levels remaining with d diffs.

What is the time complexity of comparing two Merkle trees?

1 Answers1