I am trying to find a fast algorithm to identify couples of nodes (A, B) that contains the same data and that are positioned on a tree in such way that a node A has as an ancestor the node B OR B is the sibling of an ancestor of A.
Take for example the following tree, in which the colour identify the content:
n6
andn1
are a match asn1
is an ancestor ofn6
.n5
andn3
are a match asn3
is the sibling ton2
, which is an ancestor ton5
.n3
andn7
are a match for the same reason.n5
andn7
are NOT a match asn7
is neither an ancestor ofn5
, nor a sibling to one ofn5
's ancestors.n2
andn4
are NOT a match for the same reason.
The naïve implementation of a "rule checker" is trivial, but it requires to traverse the tree multiple times (once for every node being checked), however I have the feeling that I can leverage two special properties of my tree to implement some better solution. The two properties in question are:
- I can get a flat list of all the nodes with the same content (in the example above I would get:
(n5, n3, n7)
,(n1, n6)
,(n2, n4)
. - Each of my nodes stores a reference to both its parent and all of its children (this property can be exploited recursively, like a linked list).
...but despite my conviction that there must be an quick way to find the matching nodes, I so far failed to find it.
I am currently working in python, but pseudocode or examples in other not too exoteric languages are welcomed too.