9

Given two unsorted arrays of size N each, we are to determine if the Binary Search Tree constructed from them will be equal or not.

So, the elements of the array are picked and inserted into a basic (no balancing, no red-black, nothing) binary search tree. Given directly two arrays, can we determine whether they give rise to the same Binary Search Tree.

There is an obvious O(N2) worst case time complexity solution: Construct two trees, and compare their equality.

Is there a O(N) or an O(N log N) solution to this?

The idea of the problem that I am trying to extract is: The construction of the BST depends on the relative positions of the elements. For example, if there is an element with value 51 immediately next to 20 in one array, there must be no element between 20 and 51 in the other array for the trees to be equal (otherwise 20's right child would be that number, not 51).

I did try a few approaches:

  1. The naive approach: construct 2 trees and compare.
  2. I used an interesting variant where I'd partition the array into 2 (one smaller sub-array and one sub-array bigger than the pivot), and recursively pass the left array to the left child, and the other to the right. In-place and cheeky, but still O(N2).
  3. A friend tried applying the longest sub-sequence to it and finding it and then comparing, but that is incorrect.
  4. I was making inroads into maybe solving it with a LinkedHashMap, but I am having a tough time proving its correctness.

Help, and any hints towards solving this problem would be much appreciated.

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
Kanishk
  • 167
  • 7
  • When you mean the same tree, do you mean a tree with the same structure (ie the same amount of branches etc)? – fge Dec 30 '12 at 17:12
  • @fge: Same tree = same structure. The nodes at the same level are all identical. The left child of the node at depth 4 in one tree is the same as the left child of the node at depth 4 in the other tree. (and so on..) – Kanishk Dec 30 '12 at 17:22
  • Note that constructing a tree (which is unbalanced) is NOT O(nlogn), it is O(n^2) [worst case, I have no idea what's the average case, never thought about it] - so the naive solution is O(n^2) – amit Dec 30 '12 at 17:25
  • @amit The average case is O(n log n) – Henry Dec 30 '12 at 17:27
  • @amit: +1. You're correct. That makes it worse. The naive solution is O(n^2). – Kanishk Dec 30 '12 at 17:28
  • @Henry: Can you refer me to somewhere that does this analysis? (and actually computing the expected/mean complexity of creating a node in a not balanced BST)? I'd like to read it in details – amit Dec 30 '12 at 17:29
  • @amit look for example in http://www.cs.bgu.ac.il/~ds122/wiki.files/Presentation05%5B1%5D.pdf, btw it makes a difference if average is taken over all possible sequences of numbers that are used to build the tree or over all possible trees. – Henry Dec 30 '12 at 17:40
  • Could you provide some sample data for the two lists (one set which should yield identical trees, one that gives different trees) ? – wildplasser Dec 30 '12 at 17:56
  • 1
    @wildplasser Simple example: [2,1,3],[2,3,1] will yield the same tree; [1,2,3],[2,1,3] will yield different trees. – amit Dec 30 '12 at 17:59
  • Thanks. These are nice small trees /BobRoss – wildplasser Dec 30 '12 at 18:02

4 Answers4

1

Summary

I think you can improve the naive approach from O(N^2) to O(NlogN) by using range minimum query to construct the binary tree.

Fast binary tree construction

Suppose we want to construct the binary tree for an array A.

The idea is to first construct an array B where B[i] is the position of the ith largest element in A. This can be done by sorting in O(NlogN).

We can then use range minimum query on array B to allow us to find the minimum value of B[i] for a given range a<=i<=b. In other words, this lets us find the first position in A where we have a value in the range between the ath and bth largest elements.

RMQ takes time O(N) to preprocess, and then queries can be answered in time O(1).

We can then recursively find for each element its left and right children (if any) and check that they match.

Pseudocode

Suppose the two arrays are A and A2, and we assume for simplicity that A,A2 have been preprocessed such that the ith largest element is equal to i.

The trees are identical if find_children(1,N) is True:

find_children(low,high)
   if high==low
      return True
   node = A[RMQ(low,high)]
   return node == A2[RMQ2(low,high)]
          and find_children(low,node-1)
          and find_children(node+1,high)

This function is called once for each node (and empty child pointer) in the tree so takes time O(N).

Overall, this is O(NlogN) as the preprocess sorting takes O(NlogN).

Explanation

Suppose we have entered elements 20 and 51 into a binary tree. We will then have 20 being the root, and 51 being the right child. To find the left child of 51 we need to find the first element in the array which has a value greater than 20, and less than 51. This value is given by our range minimum query applied to the range 20+1->51-1.

We can therefore find the left and right children of all nodes faster than by inserting them into the binary tree in the natural way (only faster in a theoretical worst case - the other methods may well be faster for typical examples).

Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75
1

"Construct two trees and compare" does not have to be O(N^2). You can use an auxilliary data structure that lets you find the position of the new node in O(log N) instead of O(N), so that the construction of the BST is O(N log N) even if the BST being constructed is not balanced.

With each empty position (i.e. a free child slot in a node) pos in a BST, there is an associated interval (a_pos,b_pos) (one of the values might be +/- infinity), such that new node for value v will be created at pos if and only if the value is in the interval.

You can store the intervals in a balanced interval tree, so that the position for each new arriving value can be found in O(log N). The update of the interval tree is also O(log N), as you only replace one interval with two.

(Actually, the intervals never overlap, so the auxilliary structure can be a plain old (balanced) BST instead of an interval tree.)

Example:

Take the following non-balanced BST, constructed for an array prefix [1,10,2,9,3, ...]

  1
 /  \
a  10
   / \
  2   f
 / \
b   9
   / \
  3   e
 / \
c   d

The letters a-f denote the possible places where a new node can be placed (nil leaves). With each of the letter, there's an associated interval, as follows:

[-inf,1] -> a
[1,2] -> b
[2,3] -> c
[3,9] -> d
[9,10] -> e
[10, +inf] -> f

A new node for a value v will be added into the BST at the place determined by the interval that v belongs to. Zero will end up at a, 5 at d and so on. The key idea is to store this information outside of the tree.

If you can efficiently represent the above table (with links to the actual tree nodes), adding new node to the tree will take O(access to table) + O(1). The O(1) represents adding the node into the non-balanced BST, given that you already know where you place it. Adding 5 will not require comparing with 1,10,2,9 and 3 but instead will be looked up in the table and and placed directly at d.

Once you place the new node, you obviously also have to update the table. The data structure to represent the table could be an interval tree ( http://en.wikipedia.org/wiki/Interval_tree ).

Rafał Dowgird
  • 43,216
  • 11
  • 77
  • 90
  • Hmm. Naturally, finding in O(log n) helps, but there's a problem. Balancing is not allowed. As I mentioned in the question, the trees constructed have to be "natural" unbalanced BSTs. Finding the equality of two balanced trees from an array might be an easier problem, because the median of the array will almost likely be the node and so on, so there's a more foreseeable pattern forming there. What about an unbalanced tree, is the question. Or maybe I missed your logic. Do clarify, in that case. – Kanishk Dec 31 '12 at 17:22
  • You do not balance the tree being constructed. You balance the auxilliary tree that helps you build the non-balanced tree. I'll come up with an example when I have some more time. – Rafał Dowgird Dec 31 '12 at 20:24
  • @Kanishk Expanded as much as the time allowed, feel free to shoot more questions :) – Rafał Dowgird Jan 01 '13 at 13:11
0

Try this:

int identical(struct node* a, struct node* b) 
{
    if (a==NULL && b==NULL)
    {
        return(true);
    } 
    else if (a!=NULL && b!=NULL)
    {
        return(a-> data == b-> data && identical(a->left, b->left) && identical(a->right, b->right));
    } 
    else 
        return(false);
}
Bridge
  • 29,818
  • 9
  • 60
  • 82
0

I came up with following code. It works fine, though partitioning is inefficient.

    bool isBST (vector<int> vec1, vector<int> vec2) {
    if (vec1.size() == 0 && vec2.size() == 0)
        return true;
    if (vec1.size() != vec2.size())
        return false;
    if (vec1[0] != vec2[0])
        return false;

    vector<int> temp1;
    vector<int> temp2;
    vector<int> temp3;
    vector<int> temp4;
    for (int k = 1; k < vec1.size(); k++) {
       if(vec1[k] < vec1[0])
           temp1.push_back(vec1[k]);
       else
           temp2.push_back(vec1[k]);

       if(vec2[k] < vec2[0])
           temp3.push_back(vec2[k]);
       else
           temp4.push_back(vec2[k]);
    }

    return isBST(temp1, temp3) && isBST(temp2, temp4);

}
Skandh
  • 426
  • 3
  • 18