Construct a binary tree from permutation in n log n time

Question

The numbers 1 to n are inserted in a binary search tree in a specified order p_1, p_2,..., p_n. Describe an O(nlog n) time algorithm to construct the resulting final binary search tree.

Note that :-

I don't need average time n log n, but the worst time.
I need the the exact tree that results when insertion takes place with the usual rules. AVL or red black trees not allowed.

This is an assignment question. It is very very non trivial. In fact it seemed impossible at first glance. I have thought on it much. My observations:-

The argument that we use to prove that sorting takes atleast n log n time does not eliminate the existence of such an algorithm here.
If it is always possible to find a subtree in O(n) time whose size is between two fractions of the size of tree, the problem can be easily solved.
Choosing median or left child of root as root of subtree doesn't work.

Not optimal, but extremely short to describe: pick a BST with O(log n) inserts like a red-black tree and perform n inserts into it. O(n log n). Probably what’s being looked for: get the middle item (rightmost if tied) and call it the root. Repeat for items to its left and items to its right. O(n), which is ⊂ O(n log n); produces complete and balanced BST. — Ry-, Mar 19 '17 at 10:57
is-it an exercise ? it sounds like. what have you done yet ? — guillaume girod-vitouchkina, Mar 19 '17 at 11:02
Hint: the root of a tree is always the first member inserted in it (the root never changes). — n. m. could be an AI, Mar 19 '17 at 12:08
@n.m. Does your algorithm work when the sequence is increasing? — Meet Taraviya, Mar 19 '17 at 12:33
@n.m. I have given too much time to this question. I thought on your hint, but I always got an O(n^2) algorithm in worst case. It would be really helpful if you post the solution. — Meet Taraviya, Mar 20 '17 at 08:15
@n.m. Essentially, I cannot figure out how to use the fact that this is a permutation. We can identify numbers in left and right subtrees in constant time, but cannot find their order. — Meet Taraviya, Mar 20 '17 at 08:17
It looks like my method is O(n log^2 n), I will think about it some more. — n. m. could be an AI, Mar 20 '17 at 09:52
@n.m. You can post it. This might help me give an idea of how to think. I haven't thought much, but I don't have an O(n log^2 n) algorithm as of now. — Meet Taraviya, Mar 20 '17 at 15:51
I think I have reduced it to the required O(n log n), just posted my version. The one by David Eisenstat would also work. — n. m. could be an AI, Mar 20 '17 at 17:06
Maybe there's problem on my side, but I don't get what the OP is asking for... It reads like how to construct a binary search tree from a permutation of N integers 1-N, which is trivial and all. — Guibao Wang, Jun 18 '17 at 06:01

David Eisenstat · Answer 1 · 2017-03-19T15:59:23.557

4

The trick is not to use the constructed BST for lookups. Instead, keep an additional, balanced BST for lookups. Link the leaves.

For example, we might have

Constructed    Balanced

       3           2
      / \         / \
     2   D       1   3
    / \         / | | \
   1   C       a  b c  d
  / \
 A   B

where a, b, c, d are pointers to A, B, C, D respectively, and A, B, C, D are what would normally be null pointers.

To insert, insert into the balanced BST first (O(log n)), follow the pointer to the constructed tree (O(1)), do the constructed insert (O(1)), and relink the new leaves (O(1)).

edited Mar 19 '17 at 15:59

answered Mar 19 '17 at 15:18

David Eisenstat

64,237
7
60
120

1

"insert into the balanced BST first, follow the pointer and do the constructed insert in time O(1)". This doesn't compute. – n. m. could be an AI Mar 19 '17 at 15:34
@n.m. The BST insertion is O(log n). Everything else is O(1). – David Eisenstat Mar 19 '17 at 15:59
OK I think I got your idea, but your explanation is still cryptic. – n. m. could be an AI Mar 20 '17 at 10:38
@MeetTaraviya I answered the revised question. – David Eisenstat Mar 20 '17 at 19:26
1

Your idea is sound, but many readers (which appears to include the OP @MeetTaraviya who has placed a bounty on the question for a "Detailed explanation of algorithm") are going to need a bit more explanation of what the algorithm is to identify the place to put the item in the unbalanced tree and a sketch of a proof of why exactly that is the correct place. – JimD. May 30 '17 at 12:51
@JimD. I know. I don't have time right now, unfortunately. – David Eisenstat May 30 '17 at 17:53

SergGr · Accepted Answer · 2017-05-31T20:16:52.600

As David Eisenstat doesn't have time to extend his answer, I'll try to put more details into a similar algorithm.

Intuition

The main intuition behind the algorithm is based on the following statements:

statement #1: if a BST contains values a and b (a < b) AND there are no values between them, then either A (node for value a) is a (possibly indirect) parent of B (node for value b) or B is a (possibly indirect) parent of A.

This statement is obviously true because if their lowest common ancestor C is some other node than A and B, its value c must be between a and b. Note that statement #1 is true for any BST (balanced or unbalanced).

statement #2: if a simple (unbalanced) BST contains values a and b (a < b) AND there are no values between them AND we are trying to add value x such that a < x < b, then X (node for value x) will be either direct right (greater) child of A or direct left (less) child of B whichever node is lower in the tree.

Let's assume that the lower of two nodes is a (the other case is symmetrical). During insertion phase value x will travel the same path as a during its insertion because tree doesn't contain any values between a and x i.e. at any comparison values a and x are indistinguishable. It means that value x will navigate tree till node A and will pass node B at some earlier step (see statement #1). As x > a it should become a right child of A. Direct right child of A must be empty at this point because A is in B's subtree i.e. all values in that subtree are less than b and since there are no values between a and b in the tree, no value can be right child of node A.

Note that statement #2 might potentially be not true for some balanced BST after re-balancing was performed although this should be a strange case.

statement #3: in a balanced BST for any value x not in the tree yet, you can find closest greater and closest less values in O(log(N)) time.

This follows directly from statements #1 and #2: all you need is find the potential insertion point for the value x in the BST (takes O(log(N))), one of the two values will be direct parent of the insertion point and to find the other you need to travel the tree back to the root (again takes O(log(N))).

So now the idea behind the algorithm becomes clear: for fast insertion into an unbalanced BST we need to find nodes with closest less and greater values. We can easily do it if we additionally maintain a balanced BST with the same keys as our target (unbalanced) BST and with corresponding nodes from that BST as values. Using that additional data structure we can find insertion point for each new value in O(log(N)) time and update this data structure with new value in O(log(N)) time as well.

Algorithm

Init "main" root and balancedRoot with null.
for each value x in the list do:
if this is the first value just add it as the root nodes to both trees and go to #2
in the tree specified by balancedRoot find nodes that correspond to the closest less (BalancedA, points to node A in the main BST) and closest greater (BalancedB, points to node B in the main BST) values.
- If there is no closest lower value i.e. we are adding minimum element, add it as the left child to the node B
- If there is no closest greater value i.e. we are adding maximum element, add it as the right child to the node A
- Find whichever of nodes A or B is lower in the tree. You can use explicit level stored in the node. If the lower node is A (less node), add x as the direct right child of A else add x as the direct left child of B (greater node). Alternatively (and more cleverly) you may notice that from the statements #1 and #2 follows that exactly one of the two candidate insert positions (A's right child or B's left child) will be empty and this is where you want to insert your value x.
Add value x to the balanced tree (might re-use from step #4).
Go to step #2

As no inner step of the loop takes more than O(log(N)), total complexity is O(N*log(N))

Java implementation

I'm too lazy to implement balanced BST myself so I used standard Java TreeMap that implements Red-Black tree and has useful lowerEntry and higherEntry methods that correspond to step #4 of the algorithm (you may look at the source code to ensure that both are actually O(log(N))).

import java.util.Map;
import java.util.TreeMap;

public class BSTTest {

    static class Node {
        public final int value;
        public Node left;
        public Node right;

        public Node(int value) {
            this.value = value;
        }

        public boolean compareTree(Node other) {
            return compareTrees(this, other);
        }

        public static boolean compareTrees(Node n1, Node n2) {

            if ((n1 == null) && (n2 == null))
                return true;
            if ((n1 == null) || (n2 == null))
                return false;
            if (n1.value != n2.value)
                return false;
            return compareTrees(n1.left, n2.left) &&
                    compareTrees(n1.right, n2.right);
        }


        public void assignLeftSafe(Node child) {
            if (this.left != null)
                throw new IllegalStateException("left child is already set");
            this.left = child;
        }

        public void assignRightSafe(Node child) {
            if (this.right != null)
                throw new IllegalStateException("right child is already set");
            this.right = child;
        }

        @Override
        public String toString() {
            return "Node{" +
                    "value=" + value +
                    '}';
        }
    }


    static Node insertToBst(Node root, int value) {
        if (root == null)
            root = new Node(value);
        else if (value < root.value)
            root.left = insertToBst(root.left, value);
        else  
            root.right = insertToBst(root.right, value);
        return root;
    }


    static Node buildBstDirect(int[] values) {
        Node root = null;
        for (int v : values) {
            root = insertToBst(root, v);
        }
        return root;
    }

    static Node buildBstSmart(int[] values) {
        Node root = null;
        TreeMap<Integer, Node> balancedTree = new TreeMap<Integer, Node>();
        for (int v : values) {
            Node node = new Node(v);
            if (balancedTree.isEmpty()) {
                root = node;
            } else {
                Map.Entry<Integer, Node> lowerEntry = balancedTree.lowerEntry(v);
                Map.Entry<Integer, Node> higherEntry = balancedTree.higherEntry(v);
                if (lowerEntry == null) {
                    // adding minimum value
                    higherEntry.getValue().assignLeftSafe(node);
                } else if (higherEntry == null) {
                    // adding max value
                    lowerEntry.getValue().assignRightSafe(node);
                } else {
                    // adding some middle value
                    Node lowerNode = lowerEntry.getValue();
                    Node higherNode = higherEntry.getValue();
                    if (lowerNode.right == null)
                        lowerNode.assignRightSafe(node);
                    else
                        higherNode.assignLeftSafe(node);
                }
            }
            // update balancedTree
            balancedTree.put(v, node);
        }
        return root;
    }

    public static void main(String[] args) {
        int[] input = new int[]{7, 6, 9, 4, 1, 8, 2, 5, 3};

        Node directRoot = buildBstDirect(input);
        Node smartRoot = buildBstSmart(input);
        System.out.println(directRoot.compareTree(smartRoot));
    }
}

Not quite what I was proposing, but it's as close as one can get in Java, and it runs in O(n log n) time. — David Eisenstat, May 31 '17 at 03:04
Great answer! But there's an extra constraint: Do it without using balanced trees (I am sorry I did not specify it earlier). Anyway you get the bounty. — Meet Taraviya, May 31 '17 at 04:31
One tiny detail is doubtful : Find whichever of nodes A or B is lower in the tree. Storing explicit level works, but travelling up through the tree takes O(n) time in the worst case. — Meet Taraviya, May 31 '17 at 04:43
I did mention "AVL or red black trees not allowed." . Perhaps you can use the fact that the array is already known to some extent(a permutation of 1 to n). — Meet Taraviya, May 31 '17 at 05:02
@MeetTaraviya **1)** I agree that navigation up through tree is not `O(log(N))` so I removed that bit but as you may see in the code I actually use a more clever trick described in the second part of that item that doesn't rely on navigation or explicit `level` stored in the node. **2)** I understand that you want to produce the result of insertion into a naive BST rather than some balanced version and this is exactly what my algorithm does (and what `compareTree` in the Java code verifies). However restriction that such an algorithm shouldn't use balanced trees inside seems ridiculous to me. — SergGr, May 31 '17 at 20:21
@SergGr I told you it is (was) an assignment question. The no-balanced-BST restriction was a part of the question. I could not complete it, but it obviously had a solution. As I said, you have not made the use of the fact that it's a permutation of 1 to n. — Meet Taraviya, Jun 01 '17 at 04:47

score 2 · Answer 3 · answered Jun 01 '17 at 00:59

Here's a linear-time algorithm. (I said that I wasn't going to work on this question, so if you like this answer, please award the bounty to SergGr.)

Create a doubly linked list with nodes 1..n and compute the inverse of p. For i from n down to 1, let q be the left neighbor of p_i in the list, and let r be the right neighbor. If p^-1(q) > p^-1(r), then make p_i the right child of q. If p^-1(q) < p^-1(r), then make p_i the left child of r. Delete p_i from the list.

In Python:

class Node(object):
    __slots__ = ('left', 'key', 'right')

    def __init__(self, key):
        self.left = None
        self.key = key
        self.right = None


def construct(p):
    # Validate the input.
    p = list(p)
    n = len(p)
    assert set(p) == set(range(n))  # 0 .. n-1

    # Compute p^-1.
    p_inv = [None] * n
    for i in range(n):
        p_inv[p[i]] = i

    # Set up the list.
    nodes = [Node(i) for i in range(n)]
    for i in range(n):
        if i >= 1:
            nodes[i].left = nodes[i - 1]
        if i < n - 1:
            nodes[i].right = nodes[i + 1]

    # Process p.
    for i in range(n - 1, 0, -1):  # n-1, n-2 .. 1
        q = nodes[p[i]].left
        r = nodes[p[i]].right
        if r is None or (q is not None and p_inv[q.key] > p_inv[r.key]):
            print(p[i], 'is the right child of', q.key)
        else:
            print(p[i], 'is the left child of', r.key)
        if q is not None:
            q.right = r
        if r is not None:
            r.left = q


construct([1, 3, 2, 0])

Nice solution, but I'm not sure this will be clear to OP either :) — JimD., Jun 04 '17 at 15:20

n. m. could be an AI · Answer 4 · 2017-03-20T18:42:28.173

Here's my O(n log^2 n) attempt that doesn't require building a balanced tree.

Put nodes in an array in their natural order (1 to n). Also link them into a linked list in the order of insertion. Each node stores its order of insertion along with the key.

The algorithm goes like this.

The input is a node in the linked list, and a range (low, high) of indices in the node array

Call the input node root, Its key is rootkey. Unlink it from the list.
Determine which subtree of the input node is smaller.
Traverse the corresponding array range, unlink each node from the linked list, then link them in a separate linked list and sort the list again in the insertion order.
Heads of the two resulting lists are children of the input node.
Perform the algorithm recursively on children of the input node, passing ranges (low, rootkey-1) and (rootkey+1, high) as index ranges.

The sorting operation at each level gives the algorithm the extra log n complexity factor.

You need to update the link list for the recursion to work. Moreover I think " This would be the other child of the input node." is incorrect. Consider 5,3,6,4,2,1. It would be easier to understand if you give an example. — Meet Taraviya, Mar 20 '17 at 17:25
It looks like you are right, this variant isn't quite working. I will revert the post back to the O(ln log^2 n) version. Perhaps someone else will reduce it to O(n log n) correctly. — n. m. could be an AI, Mar 20 '17 at 18:42

גלעד ברקן · Answer 5 · 2017-03-22T10:40:47.347

Here's an O(n log n) algorithm that can also be adapted to O(n log log m) time, where m is the range, by using a Y-fast trie rather than a balanced binary tree.

In a binary search tree, lower values are left of higher values. The order of insertion corresponds with the right-or-left node choices when traveling along the final tree. The parent of any node, x, is either the least higher number previously inserted or the greatest lower number previously inserted, whichever was inserted later.

We can identify and connect the listed nodes with their correct parents using the logic above in O(n log n) worst-time by maintaining a balanced binary tree with the nodes visited so far as we traverse the order of insertion.

Explanation:

Let's imagine a proposed lower parent, p. Now imagine there's a number, l > p but still lower than x, inserted before p. Either (1) p passed l during insertion, in which case x would have had to pass l to get to p but that contradicts that x must have gone right if it reached l; or (2) p did not pass l, in which case p is in a subtree left of l but that would mean a number was inserted that's smaller than l but greater than x, a contradiction.

Clearly, a number, l < x, greater than p that was inserted after p would also contradict p as x's parent since either (1) l passed p during insertion, which means p's right child would have already been assigned when x was inserted; or (2) l is in a subtree to the right of p, which again would mean a number was inserted that's smaller than l but greater than x, a contradiction.

Therefore, for any node, x, with a lower parent, that parent must be the greatest number lower than and inserted before x. Similar logic covers the scenario of a higher proposed parent.

Now let's imagine x's parent, p < x, was inserted before h, the lowest number greater than and inserted before x. Then either (1) h passed p, in which case p's right node would have been already assigned when x was inserted; or (2) h is in a subtree right of p, which means a number lower than h and greater than x was previously inserted but that would contradict our assertion that h is the lowest number inserted so far that's greater than x.

Yeah, I knew this. But how to find these? Direct approach gives O(n^2) algorithm. — Meet Taraviya, Mar 21 '17 at 13:25
@MeetTaraviya if the tree is balanced, which is why I suggest maintaining a balanced tree of the previously inserted nodes, there are standard algorithms to look up the next highest and next lowest value in the tree in `O(log n)` time - you could easily search SO for that information. All we need is to lookup and compare the insertion order of those two values to identify the correct parent for the current node as we traverse the insertion order. — גלעד ברקן, Mar 21 '17 at 13:39
@MeetTaraviya by the way (see the note I added to the answer), this solution can also be adapted to `O(n log log m)` time where `m` is the range. — גלעד ברקן, Mar 21 '17 at 13:50

maniek · Answer 6 · 2017-05-31T11:36:14.497

0

Since this is an assignment, I'm posting a hint instead of an answer.

Sort the numbers, while keeping the insertion order. Say you have input: [1,7,3,5,8,2,4]. Then after sorting you will have [[1,0], [2,5], [3,2], [4, 6], [5,3], [7,1], [8,4]] . This is actually the in-order traversal of the resulting tree. Think hard about how to reconstruct the tree given the in-order traversal and the insertion order (this part will be linear time).

More hints coming if you really need them.

edited May 31 '17 at 11:36

answered May 31 '17 at 11:31

maniek

7,087
2
20
43

Where's 6 in the input? I said it's a permutation – Meet Taraviya May 31 '17 at 11:43
The algorithm will work regardless if it's a permutation, or just some distinct integers. – maniek May 31 '17 at 11:43
1

As the keys are the integers 1..n, sorting can be done in linear time and so can the whole process !? – Jun 01 '17 at 14:27
@YvesDaoust Good observation! And since for the problem as stated the input is a permutation, this is not so much sorting, as just putting elements in the right place. – maniek Jun 01 '17 at 17:42
@MeetTaraviya you figured this out? – maniek Jun 04 '17 at 17:23

Construct a binary tree from permutation in n log n time

6 Answers6