Minimum subtree containing nodes from set

Question

There is a tree structure, e.g.

and set of nodes (leafs), that must be in subtree, e.g.

[5, 6]

How to find minimum subtree that contains all these nodes and begins from root element? Like this:

Can a node value appear in the tree in more than one location? — j_random_hacker, Nov 28 '16 at 20:55

score 2 · Accepted Answer · answered Nov 28 '16 at 11:37

Basically, you can recurse down to the leaves, and find, for each leaf, whether it is needed or not. When the recursion goes back up again, you can see if any of the descendants was needed.

Here is pseudo-code that does this:

def mark_needed_nodes(node, given_nodes):
    # If a leaf, check if it is in given_nodes
    if node is leaf:
        node.needed = node in given_nodes
        return node.needed

    # It is not a leaf; check if any of the descendants is needed.
    node.needed = False
    for child in node.children:
        node.needed = needed or mark_needed_nodes(child, given_nodes)
    return node.needed

You would call mark_needed_nodes(root, given_nodes).

Assuming given_nodes is a hash-based dictionary, the complexity is linear in the number of nodes in the tree.

score 1 · Answer 2 · answered Nov 28 '16 at 14:58

I think, there is no need to traverse the whole tree. We can just "draw the lines" from each of the given leaf nodes up to the root.

Something like this:

mark root node as needed.
take first not processed given leaf node. If there are none, we are done.
mark current node needed.
go to the parent of the current node.
if current node is already needed, go to 2, else go to 3.

score 0 · Answer 3 · answered Nov 28 '16 at 20:58

Suppose you have k nodes in your query set, and n nodes in the tree. If you need to perform many queries on the same tree, and the tree is much larger than a typical query set, then you might consider the following solution.

A complicated O(n)-preprocessing-time, O(k)-query-time solution

You can first preprocess your tree in linear time so that you can determine the lowest common ancestor of a pair of nodes in constant time. Then, for a given query, you can then find the lowest common ancestor of two query nodes, then the lowest common ancestor of that node and the third node in your query, etc., to determine the lowest common ancestor of all nodes in your query set in O(k) time overall. However the preprocessing and querying are both complicated, and this is unlikely to be the fastest way unless your tree is huge compared to your query size and you have many separate queries on the same tree (so that the time spent preprocessing pays off).

Minimum subtree containing nodes from set

3 Answers3

A complicated O(n)-preprocessing-time, O(k)-query-time solution