Find maximum subtree in the given BST such that it has no duplicates

Question

Given the BST which allows duplicates as separate vertices, how do I find the highest subtree such that it has no duplicates.

This is the idea: (1) Check if the root value appears in its right subtree (inserting this way: left < root <= right). If not, tree has no duplicates. I look for it always on the left from the root's child. (2) Traversing and doing (1) I can find all subtrees without duplicates, storing their root pointer and height. (3) Comparing heights I can find largest seeked subtree.

I don't know how to store these information while traversing. I found programs for finding all duplicate subtrees of BST that use hash maps, but if possible I would prefer to avoid using hash maps, as I haven't had them on my course yet.

<!-- language: lang-c -->

typedef struct vertex {
    int data;
    struct vertex *left;
    struct vertex *right;
} vertex, *pvertex;

// Utility functions

int Height(pvertex t){
    if (t == NULL)
        return 0;
    if (Height(t->left) > Height(t->right))
        return Height(t->left) + 1;
    else
        return Height(t->right) + 1;
}

int DoesItOccur(pvertex t, int k){
    if(!t)
        return 0;
    if(t->data==k)
        return 1;
    if(t->data<k){
        return DoesItOccur(t->left,k);     
    }
}

// My function
pvertex MaxSeeked(pvertex t){
    if(!t)
        return NULL;
    if(DoesItOccur(t->right,t->data)==0)
        return t;
    else if{
        if(t->left && t->right){
            if(Height(MaxSeeked(t->left))>Height(MaxSeeked(t->right)))
                return t->left;
            else 
                return t->right;
        }
    }
    else if{
    ......
    } 
}

This seems too simple a question, so I must be missing something, but what if the root is duplicated in its *left* subtree? The tree then does have at least one duplicate, but you may not detect that by looking for the root only in its right subtree. — John Bollinger, May 30 '19 at 21:31

score 0 · Accepted Answer · answered May 30 '19 at 22:11

I don't know how to store these information while traversing. I found programs for finding all duplicate subtrees of BST that use hash maps, but if possible I would prefer to avoid using hash maps, as I haven't had them on my course yet.

Note in the first place that you only need to track all the subtrees of the maximal height discovered so far. Or maybe you can limit that to just one such, if that's all you need to discover. For efficiency, you should also track what that maximal height actually is.

I'll suppose that you must not add members to your node structure, but if you could do, you could add a member or two wherein to record whether the tree rooted at each node contains any dupes, and how high that tree is. You could populate those data as you go, and remember what the maximum height is, then make a second traversal to collect the nodes.

But without modifying any nodes themselves, you can still track the current candidates by other means, such as a linked list. And you can put whatever metadata you want into the tracking data structure. For example,

struct nondupe_subtree {
    struct vertex *root;
    int height;
    struct nondupe_subtree *next;
};

You can then, say, perform a selective traversal of your tree in breadth first order, carrying along a linked list of struct nondupe_subtree nodes:

Start by visiting the root node.
Test the subtree rooted at each visited node to see whether it contains any dupes, according to the procedure you have described.
If so then enqueue its children for traversal.
If not then measure the subtree height and update your linked list (or not) accordingly. Do not enqueue this node's children.

When no more nodes are enqueued for traversal, you linked list contains the roots of all the maximal height subtrees without dupes.

Note that that algorithm would in many cases be significantly sped if you could compute and store all the subtree heights in an initial DFS pass, for it is otherwise prone to performing duplicate tree-height computations. Many of them, in some cases.

Note also that although it does simplify this particular algorithm, your rule for always putting dupes to the right works against balanced trees, which may also yield reduced performance. In the worst case, where are vertices are duplicate, your "tree" will perforce be linear.

Thanks a lot for such a comprehensive answer! – Stanisław Maksicki May 30 '19 at 22:56 — Stanisław Maksicki, May 30 '19 at 22:56

Find maximum subtree in the given BST such that it has no duplicates

1 Answers1