Quickly detecting identical nodes which are siblings to an ancestor

Question

I am trying to find a fast algorithm to identify couples of nodes (A, B) that contains the same data and that are positioned on a tree in such way that a node A has as an ancestor the node B OR B is the sibling of an ancestor of A.

Take for example the following tree, in which the colour identify the content:

Example tree

n6 and n1 are a match as n1 is an ancestor of n6.
n5 and n3 are a match as n3 is the sibling to n2, which is an ancestor to n5.
n3 and n7 are a match for the same reason.
n5 and n7 are NOT a match as n7 is neither an ancestor of n5, nor a sibling to one of n5's ancestors.
n2 and n4 are NOT a match for the same reason.

The naïve implementation of a "rule checker" is trivial, but it requires to traverse the tree multiple times (once for every node being checked), however I have the feeling that I can leverage two special properties of my tree to implement some better solution. The two properties in question are:

I can get a flat list of all the nodes with the same content (in the example above I would get: (n5, n3, n7), (n1, n6), (n2, n4).
Each of my nodes stores a reference to both its parent and all of its children (this property can be exploited recursively, like a linked list).

...but despite my conviction that there must be an quick way to find the matching nodes, I so far failed to find it.

I am currently working in python, but pseudocode or examples in other not too exoteric languages are welcomed too.

for the person voting to close, the question is: "what is a fast algorithm for finding nodes A and B that contain the same data where either A is either a descendant of B or a descendant of one of B's siblings?" — andrew cooke, Aug 16 '12 at 23:03

lavin · Accepted Answer · 2012-08-17T09:26:22.557

I believe this is the solution. The solution takes O(1) to answer each query after a pre-calculation of dfs-visiting-time costs O(n).

the dfs looks like:

nowtime=0
def dfs(node):
    global nowtime
    nowtime+=1
    node.come_time=nowtime
    for i in node.sons:
        dfs(i)
    nowtime+=1
    node.leave_time=nowtime
dfs(root)

Then, we have :

B is an ancestor of A , if and only if we have

B.come_time < A.come_time and B.leave_time > A.leave_time

I think it's true that:

A is a descendant of B's siblings, if and only if A is a descendant of B's direct father. And (thanks to @mac) A is not one of B's siblings. And also A is not a descendant of B.

so we can check :

B.fa.come_time < A.come_time and B.fa.leave_time > A.leave_time

and

B.fa != A.fa

To sum up, to answer a query we have :

def check(A,B):
    if B.come_time<A.come_time and B.leave_time>A.leave_time:
        return True
    if B.has_father() and A.has_father():
        if A.fa==B.fa:
            return False
        if B.fa.come_time<A.come_time and B.fa.leave_time>A.leave_time:
            return True
    return False

The key idea in this solution is to use the visiting time in a dfs() to check if a node B is another node A's ancestor. the [come_time, leave_time] interval is exactly the time-interval that a node is kept in the stack. It's easy to verify that in a dfs procedure, an ancestor's visiting time-interval will contain the time-intervals of all it's descendants, since it's always in the stack while the dfs() is visiting it's descendants.

Added:

We can prove that:

A is a descendant of B's siblings, if and only if A is a descendant of B's direct father. And (thanks to @mac) A is not one of B's siblings. And also A is not a descendant of B.

since:

If A is a descendant of B's direct father, then A is in the sub-tree rooted at B.fa The sub-tree contains and only contains:

B.fa
B
B's siblings
descendants of B
descendants of B's siblings

So, if A is not 1, not 2, not in 3, not in 4, then A must be in 5.

And if A is not a descendant of B's direct father, then A is not in the sub-tree. it's clear that A can never be a descendant of B's siblings, since all the siblings of B are in the sub-tree.

It seems a clever idea, smart and +1 (for now!). One thing that I can already spot (but easily fixable) is that `A is a descendant of B's siblings, if and only if A is a descendant of B's direct father` will return false positives for `A=n2`, `B=n4`. — mac, Aug 17 '12 at 07:52
and it's so difficult to get an answer accepted on stackoverflow .. :( — lavin, Aug 17 '12 at 09:22
it's not accepted only because I am still working on its implementation! Once code passes test you will get your glory! :) — mac, Aug 17 '12 at 10:05
The come/leave time appears to be a riff on nested sets http://en.wikipedia.org/wiki/Nested_set_model. Is there a reason to use time as opposed to an incrementing counter for that purpose? — orangepips, Aug 17 '12 at 10:45
@orangepips - `Time` is already a counter in the example. It's the name variable that is a misnomer... — mac, Aug 17 '12 at 11:27

Quickly detecting identical nodes which are siblings to an ancestor

1 Answers1