I have a simple recursive function that compares two merkle trees and accumulates the differences in the leaf nodes. However, I am unable to measure the time complexity of it. Specifically, I would like to see how does it compare against comparing two Hashtables or two BSTs. A group of leaves is a row in this case and they share a rowid. A group of rows form the entire merkle tree. In the code below I am just accumulating the differences at the leaf level.
def diff_helper(node1: MNode, node2: MNode, diff: List[Difference]):
if not node1 and not node2:
return
elif node1.rowid==node2.rowid and node1.signature==node2.signature and node1.nodetype==NodeType.Row and node2.nodetype==NodeType.Row:
return
elif node1.rowid==node2.rowid and node1.signature!=node2.signature and node1.nodetype==NodeType.Row and node2.nodetype==NodeType.Row:
diff_helper(node1.left, node2.left, diff)
diff_helper(node1.right, node2.right, diff)
elif node1.rowid==node2.rowid and node1.signature!=node2.signature and node1.nodetype==NodeType.Leaf and node2.nodetype==NodeType.Leaf:
diff.append(Difference(node1.rowid, node1.column, node1.value, node2.value))
else:
diff_helper(node1.left, node2.left, diff)
diff_helper(node1.right, node2.right, diff)
Time complexity:
On the best case, I see that this is a constant operation since the root hashes of both trees would be the same. On the worst case, the number of comparisons is the total number of all leaf nodes.
Question:
I can sense that merkle trees does better than plain hashtables because of the ability to prune the trees much faster. However, I am unable to represent this in Big O terms.
A comparable hashtable implementation would be to do a traversal of the rowids and constant time lookup on the second hashtable. Once you find the values of the rowid, you'll probably do a linear comparison of each leaf if the leaf level data is stored as a hashtable.