Finding elements in a BST that sum up to a provided value

Question

I'm trying to get an approach to solve the problem

Find two elements in balanced BST which sums to a given a value.

Constraints Time O(n) and space O(logn).

I was wondering whether the following algorithm would be valid.

int[] findSum(Node root, int sum){
   int[] sumArray;
   for (int i=0;i<sum;i++) {
      if (root.contains(i) && root.contains(sum-i)) {
         sumArray[0] = i;
         sumArray[1] = sum-i;
      }
   }
}

I understand that my approach could be wrong . I would appreciate any feedback /correction to my pseudocode / better algorithms

Your if-statement will never work because you're looking in the same node for sum[0] ans sum[1]. You'll have to search for the indexes separably. And how are you checking order nodes? There should be a recursive call to the left and right nodes. — MosesA, Apr 24 '13 at 00:10
Also should it not be `if(root.contains(sum[i]) &&root.contains(sum[-i])){`? Because sum is an array. — MosesA, Apr 24 '13 at 00:12
Make sure to initialize `sum` to be a two-element array! Otherwise, you'll get `ArrayIndexOutOfBoundsException`s as soon as you try to write the result! — templatetypedef, Apr 24 '13 at 00:26

templatetypedef · Accepted Answer · 2013-04-24T00:24:56.027

I believe that the approach you have will work, but does not have the appropriate time constraints.

Let's start off by analyzing the complexity of this algorithm. Note that there are two different parameters to take into consideration here. First, there's the total number of elements in the BST. If you make the BST larger or smaller, it will take more or less time for the algorithm to complete. Let's call this number n. Second, there's the total number you want the values to sum up to. Let's call that value U.

So let's see what your current algorithm does. Right now, it iterates a loop O(U) times, on each iteration checking whether two values exist within the BST. Each BST lookup takes time O(log n), so the total amount of work your algorithm does is O(U log n). However, your algorithm only uses constant space (that is, space O(1)).

Unfortunately, this runtime is not at all good. If the target number is very large (say, 1,000,000,000), then your algorithm is going to take a very long time to finish because U is going to be very large.

So the question now is how you can solve this problem more efficiently. Fortunately, there's a very cute algorithm we can use to solve the problem, and it will leverage the fact that the elements of the BST are stored in sorted order.

Let's begin by solving a similar problem that is a bit different from the one you're posing. Suppose that instead of giving you a BST and a target number, I give you a sorted array and a target number and then ask the same question: are there two numbers in this sorted array that sum up to the target? For example, I might give you this array:

 0  1  3  6  8  9  10  14  19

Let's suppose you wanted to know if two numbers in this array sum up to 16. How might you do this?

You could try the same approach you had before: check if the array contains 0 and 16, 1 and 15, 2 and 14, etc. until you found a pair or ran out of values to check. Checking whether an element exists in a sorted array takes time O(log n) using binary search, so this algorithm still takes O(U log n) time. (You could conceivably speed this up using interpolation search if you knew that the values were nicely distributed, which would give O(U log log n) runtime on expectation, but that large U term is still a problem).

So let's consider a different approach. Fundamentally, what you're doing requires you to explicitly enumerate all pairs that sum up to U. However, most of them aren't going to be there, and, in fact, most of the time neither element in the pair will be there. We could make things a lot faster by using the following algorithm:

For each element of x the array, check if U - x is in the array.
If so, report success.
Otherwise, if no such pair exists, report failure.

This algorithm will require you to look at a total of O(n) elements in the array, each time doing O(log n) work to find the matching pair. In this worst case, this will take O(n log n) time and use O(1) space. This is much better than before if U is a huge number, because there's no longer any dependence on U at all!

However, we can speed things up even further by making a clever observation about the structure of the problem. Let's suppose that we are looking at the number x in the array and do a binary search to look for U - x. If we find it, we're done. If not, we'll find the first number in the array that's greater than U - x. Let's call that number z.

So now suppose that we want to see if a number y could be part of the pair that sums up to U, and moreover, suppose that y is bigger than x. In that case we know that

y + z

> x + z

> x + (U - x)

= U

What this says is that the sum of y and z is greater than U, so it can't possibly be U. What does this mean? Well, we found z by trying to do a binary search for the element that would pair with x to sum up to U. What we've just shown is that if you try to add z to any number in the array that's bigger than x, the total has to exceed U. In other words, z can't pair with anything bigger than x. Similarly, anything bigger than z can't pair with anything bigger than x, because it would have to sum up to something larger than U.

Based on this observation, we can try using a different algorithm. Let's take our array, as before, and see if we can find a pair that sums to 16:

 0  1  3  6  8  9  10  14  19

Let's maintain two pointers - a "left-hand side" pointer lhs and a "right-hand side" pointer rhs:

 0  1  3  6  8  9  10  14  19
 ^                          ^
lhs                        rhs

When we sum up these two numbers, we get back 19, which exceeds U. Now, any pair of numbers that we add up has to have its lower number be at least as large as the lhs number, which is 0. Therefore, if we tried summing up any pair in here that uses the number 19, we know that the sum would be too large. Therefore, we can eliminate from consideration any pair containing 19. This leaves

 0  1  3  6  8  9  10  14  19
 ^                      ^
lhs                    rhs

Now, look at this sum (14), which is too small. Using similar logic as before, we can safely say that any sum in the remaining array that uses 0 must end up giving a sum smaller than 16, because the second number in the sum would be at most 14. Therefore, we can rule out the 0:

 0  1  3  6  8  9  10  14  19
    ^                   ^
   lhs                 rhs

We're beginning to see a pattern:

If the sum is too small, advance lhs.
If the sum is too big, decrement rhs.

Eventually, we will either find a pair that sums up to 16, or lhs and rhs will bump into one another, at which point we're guaranteed that no pair sums up to 16.

Tracing through this algorithm, we get

 0  1  3  6  8  9  10  14  19
    ^                   ^
   lhs                 rhs    (sum is 15, too small)

 0  1  3  6  8  9  10  14  19
       ^                ^
      lhs              rhs    (sum is 17, too big)

 0  1  3  6  8  9  10  14  19
       ^            ^
      lhs          rhs        (sum is 13, too small)

 0  1  3  6  8  9  10  14  19
          ^         ^
         lhs       rhs        (sum is 16, we're done!)

Et voila! We've got our answer.

So how fast is this? Well, on each iteration, either lhs goes down or rhs goes up, and the algorithm stops when they meet. This means we do O(n) total iterations. Each iteration does at most constant work, so this will require at most O(1) work per iteration. This gives a total time complexity of O(n).

How about space? Well, we need to store two pointers, each of which takes up O(1) space, so the total space usage is O(1). Great!

But what does this have to do with your problem? The connection is this: at each point in time, this algorithm only needs to remember two numbers in the array. It then needs to be able to advance from one element to the next or from one element to the previous. That's all that it has to do.

So suppose that instead of using a sorted array, you use a binary search tree. Start off with two pointers, one to the smallest node and one to the largest. Once you have this setup, you can simulate the above algorithm, replacing "increment lhs" and "decrement rhs" with "move to the inorder successor of lhs" and "move to the inorder predecessor of rhs." These operations can be implemented so that they use a total of O(log n) space (assuming the tree is balanced) and such that any sequence of n operations of each type take a total of O(n) time total (regardless of whether the tree is balanced). Consequently, if you were to use the above algorithm modified to work on a BST, you would get an algorithm that takes time O(n) and uses only O(log n) space!

The implementation details are a bit tricky, and I'll leave that part as an exercise to you. If you aren't familiar with doing inorder successor and predecessor, there are many good resources available online. They're standard algorithms on BSTs and are super useful to know about. I'll also leave the formal proof of correctness of the array algorithm as an exercise, though it's not too hard.

Hope this helps!

I don't see why `successor()` and `predecessor()` would require `O(log n)` space. A pointer (and thus constant space) should suffice to iterate the tree towards the successor/predecessor of any node, no? — G. Bach, Apr 24 '13 at 00:35
@G.Bach- If the tree doesn't have parent pointers, you'll need a stack of nodes to remember the access path, which IIRC is necessary to find the next and previous elements. Or am I mistaken? — templatetypedef, Apr 24 '13 at 01:22
You're right, I didn't think of the possibility that there is no pointer to the parent; in that case you will need a stack, or at least recursive calls which in effect will use a stack as well - good point! — G. Bach, Apr 24 '13 at 01:52
@templatetypedef , using Threaded tree this algorithm gives the best time . — MissingNumber, Apr 24 '13 at 09:56

MosesA · Answer 2 · 2013-04-24T09:24:52.433

I think you'll have to search the tree twice. But first, you're taking in an integer called sum but then it's suddenly an array? Is that a typo? I'm assuming that you meant to take in a sum array.

You'll have to traverse the tree and for every node, call another traversal from the root, looking for a node that can be added to the first node element that equals to the sum.

Also you cant have sum to be a variable and an array in the same method.

Now that I just saw your edit, taking the number 17 as an example. You first search for 0, if you find it, that call a another search starting from the root search for 17 -0. If you don't find it, increment 0 to 1 and search for 17-1 until you find two numbers that give you 17.

Edit

//we're looking for two numbers that equal 17 when added
Node runner;
Node root;
int i;
int [] sum_total;

void findSum(int sum){
    int search_1st = 0;
    sum_total = new int[2];
    search(int search_1st);
}   

search( Node root, int num1){
    if(i == 3){
        return;
    }
    Node runner = root;
    if(ruunner == null){
    return ;
    }
    if(runner.element == num1){
        sum_total[i] = num1;
        i++;
        if(i == 3){
            return;
        }
        //now search for sum - num1 with root
        search(root, sum - num1);
    }else{
        if(runner.left < num1){
            search(runner.right, num1);
        }else{
            search(runner.left, num1);
        }
    }
}

Could you explain what you are doing with the `if (i==3)` there? Also,since it traverses the entire tree for a particular value of `search_1st` isnt it running in O(n)? — seeker, Apr 24 '13 at 03:43
This gives runtime of O(m^2) , m= NlogN , N is the no of nodes in the tree . — MissingNumber, Apr 24 '13 at 05:57
`if (1 == 3)` is just a way of making the program to stop searching for the second number once it has found it. I think it would be better if that condition was at the start of the method. I'll edit it. Yes, it is running on O(n). I'll edit it so it won't have to branch into every node before comparing the element with the number. — MosesA, Apr 24 '13 at 09:25

score 0 · Answer 3 · answered Jan 18 '14 at 12:44

From http://www.geeksforgeeks.org/find-a-pair-with-given-sum-in-bst/

/* In a balanced binary search tree isPairPresent two element which sums to
   a given value time O(n) space O(logn) */
#include <stdio.h>
#include <stdlib.h>
#define MAX_SIZE 100

// A BST node
struct node
{
    int val;
    struct node *left, *right;
};

// Stack type
struct Stack
{
    int size;
    int top;
    struct node* *array;
};

// A utility function to create a stack of given size
struct Stack* createStack(int size)
{
    struct Stack* stack =
        (struct Stack*) malloc(sizeof(struct Stack));
    stack->size = size;
    stack->top = -1;
    stack->array =
        (struct node**) malloc(stack->size * sizeof(struct node*));
    return stack;
}

// BASIC OPERATIONS OF STACK
int isFull(struct Stack* stack)
{   return stack->top - 1 == stack->size;  }

int isEmpty(struct Stack* stack)
{   return stack->top == -1;   }

void push(struct Stack* stack, struct node* node)
{
    if (isFull(stack))
        return;
    stack->array[++stack->top] = node;
}

struct node* pop(struct Stack* stack)
{
    if (isEmpty(stack))
        return NULL;
    return stack->array[stack->top--];
}

// Returns true if a pair with target sum exists in BST, otherwise false
bool isPairPresent(struct node *root, int target)
{
    // Create two stacks. s1 is used for normal inorder traversal
    // and s2 is used for reverse inorder traversal
    struct Stack* s1 = createStack(MAX_SIZE);
    struct Stack* s2 = createStack(MAX_SIZE);

    // Note the sizes of stacks is MAX_SIZE, we can find the tree size and
    // fix stack size as O(Logn) for balanced trees like AVL and Red Black
    // tree. We have used MAX_SIZE to keep the code simple

    // done1, val1 and curr1 are used for normal inorder traversal using s1
    // done2, val2 and curr2 are used for reverse inorder traversal using s2
    bool done1 = false, done2 = false;
    int val1 = 0, val2 = 0;
    struct node *curr1 = root, *curr2 = root;

    // The loop will break when we either find a pair or one of the two
    // traversals is complete
    while (1)
    {
        // Find next node in normal Inorder traversal. See following post
        // http://www.geeksforgeeks.org/inorder-tree-traversal-without-recursion/
        while (done1 == false)
        {
            if (curr1 != NULL)
            {
                push(s1, curr1);
                curr1 = curr1->left;
            }
            else
            {
                if (isEmpty(s1))
                    done1 = 1;
                else
                {
                    curr1 = pop(s1);
                    val1 = curr1->val;
                    curr1 = curr1->right;
                    done1 = 1;
                }
            }
        }

        // Find next node in REVERSE Inorder traversal. The only
        // difference between above and below loop is, in below loop
        // right subtree is traversed before left subtree
        while (done2 == false)
        {
            if (curr2 != NULL)
            {
                push(s2, curr2);
                curr2 = curr2->right;
            }
            else
            {
                if (isEmpty(s2))
                    done2 = 1;
                else
                {
                    curr2 = pop(s2);
                    val2 = curr2->val;
                    curr2 = curr2->left;
                    done2 = 1;
                }
            }
        }

        // If we find a pair, then print the pair and return. The first
        // condition makes sure that two same values are not added
        if ((val1 != val2) && (val1 + val2) == target)
        {
            printf("\n Pair Found: %d + %d = %d\n", val1, val2, target);
            return true;
        }

        // If sum of current values is smaller, then move to next node in
        // normal inorder traversal
        else if ((val1 + val2) < target)
            done1 = false;

        // If sum of current values is greater, then move to next node in
        // reverse inorder traversal
        else if ((val1 + val2) > target)
            done2 = false;

        // If any of the inorder traversals is over, then there is no pair
        // so return false
        if (val1 >= val2)
            return false;
    }
}

// A utility function to create BST node
struct node * NewNode(int val)
{
    struct node *tmp = (struct node *)malloc(sizeof(struct node));
    tmp->val = val;
    tmp->right = tmp->left =NULL;
    return tmp;
}

// Driver program to test above functions
int main()
{
    /*
                   15
                /     \
              10      20
             / \     /  \
            8  12   16  25    */
    struct node *root =  NewNode(15);
    root->left = NewNode(10);
    root->right = NewNode(20);
    root->left->left = NewNode(8);
    root->left->right = NewNode(12);
    root->right->left = NewNode(16);
    root->right->right = NewNode(25);

    int target = 28;
    if (isPairPresent(root, target) == false)
        printf("\n No such values are found\n");

    getchar();
    return 0;
}

score 0 · Answer 4 · answered Jan 09 '15 at 19:32

0

Alternatively, traverse through the tree and store all values in a HashSet. Then do another traversal, see if (target - nodeValue) is in the set. It can be done in O(n) time, O(n) space.

answered Jan 09 '15 at 19:32

SkyTech

1

1

This solution works, but it doesn't match the OP's memory constraints. This solution also runs in *expected* time O(n) versus *worst-case* time O(n). – templatetypedef Jan 09 '15 at 19:34

score 0 · Answer 5 · answered Oct 30 '15 at 17:01

The idea is same as that of earlier solution, just that I am doing it with two stacks - one following the inorder(stack1) and another following reverse - inorder order(stack2). Once we reach the left-most and the right-most node in a BST, we can start comparing them together.

If the sum is less than the required value, pop out from stack1, else pop from stack2. Following is java implementation of the same:

public int sum2(TreeNode A, int B) {
    Stack<TreeNode> stack1 = new Stack<>();
    Stack<TreeNode> stack2 = new Stack<>();
    TreeNode cur1 = A;
    TreeNode cur2 = A;

    while (!stack1.isEmpty() || !stack2.isEmpty() ||
            cur1 != null || cur2 != null) {
        if (cur1 != null || cur2 != null) {
            if (cur1 != null) {
                stack1.push(cur1);
                cur1 = cur1.left;
            }

            if (cur2 != null) {
                stack2.push(cur2);
                cur2 = cur2.right;
            }
        } else {
            int val1 = stack1.peek().val;
            int val2 = stack2.peek().val;

            // need to break out of here
            if (stack1.peek() == stack2.peek()) break;

            if (val1 +  val2 == B) return 1;

            if (val1 + val2 < B) {
                cur1 = stack1.pop();
                cur1 = cur1.right;
            } else {
                cur2 = stack2.pop();
                cur2 = cur2.left;
            }
        }
    }

    return 0;
}

Finding elements in a BST that sum up to a provided value

5 Answers5