Efficient equality function on trees

Question

A few days ago, I was given the following interview question. It was described with Standard ML code, but I was free to answer with the language of my choice (I picked Python):

I have a type:
datatype t 
  = Leaf of int
  | Node of (t * t)
and a function, f with the signature
val f: int -> t
You need to write a function equals that checks whether two trees are equal. f is O(n), and it does "the worst possible thing" for the time complexity of your equals function. Write equals such that it is never exponential on n, the argument to f.

The example of f that was provided was:

fun f n = 
  if n = 0 then 
    Leaf(0)
  else 
    let 
      val subtree = f (n - 1) 
    in
      Node (subtree, subtree)
    end

which produces an exponentially large tree in O(n) time, so equals (f(n), f(n)) for the naive equals implementation that's linear on the number of nodes of the tree is O(2^n).

I produced something like this:

class Node:
    def __init__(self, left, right):
        self.left = left
        self.right = right

class Leaf:
    def __init__(self, value):
        self.value = value

def equals(left, right):
    if left is right:
        return True
    try:
        return left.value == right.value 
    except ValueError:
        pass
    try:
        return equals(left.left, right.left) and equals(left.right, right.right)
    except ValueError:
        return False

which worked on the example of f that the interviewer provided, but failed in the general case of "f does the worst thing possible." He provided an example that I don't remember that broke my first attempt. I flubbed around for a bit and eventually made something that looked like this:

cache = {}
def equals(left, right):
    try:
        return cache[(left, right)]
    except KeyError:
        pass

    result = False
    try:
        result = left.value == right.value 
    except ValueError:
        pass
    try:
        left_result = equals(left.left, right.left) 
        right_result = equals(left.right, right.right)
        cache[(left.left, right.left)] = left_result
        cache[(left.right, right.right)] = right_result
        result = left_result and right_result
    except ValueError:
        pass

    cache[(left, right)] = result
    return result

but I felt like that was an awkward hack and it clearly wasn't what the interviewer was looking for. I suspect that there's an elegant way to avoid recomputing subtrees -- what is it?

Wait, you are supposed to get sublinear time for equals? I don't think that's possible in general. Is there a restriction that `f(n)` will use at most O(n) unique nodes so that memoization actually helps you in the worst case? E.g. this would be the case if `f` returns a strict, fully evaluated tree (due to the time constraint) — Niklas B., Oct 15 '14 at 06:58
@NiklasB. The restriction is that `f(n)` is `O(n)`, which I think implies that it makes at most `O(n)` unique nodes. — Patrick Collins, Oct 15 '14 at 07:01
@PatrickCollins If the returned tree is fully evaluated, that is correct — Niklas B., Oct 15 '14 at 07:01
By the way, I think your final solution is O(n^2) in the worst case, not O(n) — Niklas B., Oct 15 '14 at 07:08
@NiklasB. I was turned down for the position, so I assume I did something wrong. I'm having trouble thinking about what "`O(n)` unique nodes arranged somehow in the tree" looks like and the ways that a comparison function can fail. — Patrick Collins, Oct 15 '14 at 07:14
@Patrick Collins: “I was turned down for the position”. I guess they see that you are asking here. ;-) — beroal, Oct 15 '14 at 07:57
@Patrick Collins: “what "O(n) unique nodes arranged somehow in the tree" looks like.” Nodes are objects in the **heap** that contain addresses of objects. Here, a node contains 2 identical addresses. It's language implementation. — beroal, Oct 15 '14 at 08:00

seanmcl · Answer 1 · 2014-10-15T12:27:10.033

3

You can use hash consing to create replicas of both trees in linear time and then compare them for equality in constant time.

Here is an example of hash consing in sml.

https://github.com/jhckragh/SMLDoc/tree/master/smlnj-lib/HashCons

Update:

See comments. I was too hasty in this answer. I don't think it's possible to create the replica in linear time. You'd need to start with the hash-consed type, and only use those constructors in f.

edited Oct 15 '14 at 12:27

answered Oct 15 '14 at 10:36

seanmcl

9,740
3
39
45

Ah! The interview began with some questions about hash tables that I thought were unrelated. This sounds like the ticket. I'm going to wait a bit to accept an answer but I suspect you've gotten it. – Patrick Collins Oct 15 '14 at 10:41
Isn't this linear in the total number of nodes, which can be exponential in n? I don't quite understand how you could implement this efficiently on the given type without lower-level (referential transparency-breaking) hacks. – Niklas B. Oct 15 '14 at 11:44
SML isn't referentially transparent. It has state, e.g. refs and hashtables. – seanmcl Oct 15 '14 at 11:58
@Niklas: You're right though that it's not clear to me how to translate a given value of type t to a hash-consed variant without traversing the nodes using just pattern matching, which would yield exponential time if you tried equal(f(N), f(N)). I believe you'd need to change the type t. – seanmcl Oct 15 '14 at 12:20

Niklas B. · Accepted Answer · 2014-10-15T11:33:43.390

1

Your solution as such is O(n^2) by the looks of it. We can make it O(n) by using memoization on the identity of a single tree, rather than a pair of trees:

memoByVal = {}
memoByRef = {id(None): 0}
nextId = 1

# produce an integer that represents the tree's content
def getTreeId(tree):
  if id(tree) in memoByRef:
    return memoByRef[id(tree)]
  # nodes are represented by the (left, right, value) combination
  # let's assume that leafs just have left == right == None
  l, r = getTreeId(tree.left), getTreeId(tree.right)
  if (l, r, tree.value) not in memoByVal:
    memoByVal[l, r, tree.value] = nextId
    nextId += 1
  res = memoByVal[l, r, tree.value]
  memoByRef[id(tree)] = res
  return res

# this is now trivial
def equals(a, b):
  return getTreeId(a) == getTreeId(b)

edited Oct 15 '14 at 11:33

answered Oct 15 '14 at 07:14

Niklas B.

92,950
18
194
224

This is not possible in SML. There is no id function. You can not just grab the memory location of an arbitrary value, which would break type abstraction. – seanmcl Oct 15 '14 at 10:34
@seanmcl I was allowed to use the language of my choice. But given that the interviewer was a Standard ML programmer, I assume that the solution he wanted was possible to implement in Standard ML. – Patrick Collins Oct 15 '14 at 10:36
@seanmcl In a referentially transparent language the task is impossible to solve. OP is allowed to use any language though – Niklas B. Oct 15 '14 at 10:37
@Niklas B.: You forgot to return `res`. – beroal Oct 15 '14 at 11:12

Efficient equality function on trees

2 Answers2