6

I am trying to write a program that can detect and print two nodes in BST that have been swapped.

In a three level tree, I reached near to the solution using this approach.

If (!AllSubTreeAreValid())
{
//Nodes swapped on same side of main root node
}
else
{
  int max = getMax(root->left);
  int min = getMin(root->right);
  if(max > root->data || min < root->data )
  {
     //Nodes swapped on different sides of main root node
     //Print max and min values

  }
else 
{
 //No node swappped
}
}

//Helper functions
int GetMaxEle(TreeNode* tree)
{
    while(tree->right!=NULL)
    {
        tree=tree->right;
    }
    return tree->info;
}

int GetMinEle(TreeNode* tree)
{
    while(tree->left!=NULL)
    {
        tree=tree->left;
    }
    return tree->info;
}

but the above approach failed when I tried to test with four level tree.

             20

      15            30

   10    17       25     33

9  16  12  18  22  26  31  34

Being in root node 15's right subtree, 12 is still greater (violation).

Being in root node 15's left subtree, 16 is still greater (violation).

So, 16, 12 are the swapped elements in the above BST. How do I find them through the program?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Cipher
  • 5,894
  • 22
  • 76
  • 112

3 Answers3

8

One way to think about this problem is to use the fact that an inorder walk of the tree will produce all of the elements in sorted order. If you can detect deviations from the sorted order during this walk, you can try to locate the two elements that are in the wrong place.

Let's see how to do this for a simple sorted array first, then will use our algorithm to build something that works on trees. Intuitively, if we start off with a sorted array and then swap two (non-equal!) elements, we will end up with some number of elements in the array being out of place. For example, given the array

1 2 3 4 5

If we swap 2 and 4, we end up with this array:

1 4 3 2 5

How would we detect that 2 and 4 were swapped here? Well, since 4 is the greater of the two elements and was swapped downward, it should be greater than both of the elements around it. Similarly, because 2 was swapped up, it should be smaller than both of the elements around it. From this, we could conclude that 2 and 4 were swapped.

However, this doesn't always work correctly. For example, suppose that we swap 1 and 4:

4 2 3 1 5

Here, both 2 and 1 are smaller than their neighboring elements, and both 4 and 3 are larger than theirs. From this we can tell that two of these four somehow were swapped, but it's not clear which ones we should interchange. However, if we take the largest and the smallest of these values (1 and 4, respectively), we end up getting the pair that was swapped.

More generally, to find the elements that were swapped in the sequence, you want to find

  • The greatest local maximum in the array.
  • The smallest local minimum in the array.

These two elements are out of place and should be swapped.

Now, let's think about how to apply this to trees. Since an inorder walk of the tree will produce the sorted sequence with the two elements out of order, one option would be to walk the tree, recording the inorder sequence of elements we've found, then using the above algorithm. For example, consider your original BST:

              20
         /         \
      15             30
     /   \         /   \ 
   10    17      25     33
  / |   /  \    /  \    |  \
9  16  12  18  22  26  31  34

If we linearize this into an array, we get

9 10 16 15 12 17 18 20 22 25 26 30 31 33 34

Notice that 16 is greater than its surrounding elements and that 12 is less than its. This immediately tells us that 12 and 16 were swapped.

A simple algorithm for solving this problem, therefore, would be to do an inorder walk of the tree to linearize it into a sequence like a vector or deque, then to scan that sequence to find the largest local maximum and the smallest local minimum. This would run in O(n) time, using O(n) space. A trickier but more space-efficient algorithm would be to only keep track of three nodes at a time - the current node, its predecessor, and its successor - which reduces the memory usage to O(1).

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • @templatetypdedef: I was earlier trying this approach with other test cases and it was coming out to be very long. By an inorder traverssal, I placed all the elements in the list. Now, there are too many cases coming up to find those two elements in the list. How do you suggest finding the local maximum and minimum withing the list? – Cipher Sep 01 '11 at 04:50
  • 2
    @Cipher- This isn't as hard as it might seem. Iterate over every element of the list. If that element is greater than the elements just before and just after it (being sure to check for edge cases), then that element is a local maximum. If the element is less than the elements just before and just after it, then it's a local minimum. If you keep track of the lowest minimum and highest maximum you've seen, you can find the two swapped values with a single pass over the array. Does that make sense? Or is there something else I can try to help out with? – templatetypedef Sep 01 '11 at 04:56
  • I'll try out this again just now and let you know – Cipher Sep 01 '11 at 05:03
  • @Cipher- That's the right idea, but the code looks a bit buggy (you haven't initialized `max` or `min`, for example.) If you have questions with that code, I'd suggest posting it as a separate question so that people can give more precise criticism of what's going on. – templatetypedef Sep 01 '11 at 05:18
  • http://pastebin.com/73HKJPr6 Since the elements are added in the list in descending order, considering this series in the list gives 17, 15, 12, 10, 5, 7, 2 Answer comes to be 17, 5 whereas it should be 5, 7 – Cipher Sep 01 '11 at 05:29
  • 1
    Ah - sorry, one detail - if everything is in descending order, the first element is always treated as if it's less than the nonexistent element before it (it can't be a local maximum) and the last element is always treated as if it's greater than the element before it (it can't be a local minimum). If you do that, then in the previous list, the only local maximum is 7 and the only local minimum is 5, which is the correct pair. – templatetypedef Sep 01 '11 at 05:31
  • @tempelatetypedef: Yes, but how would this treatment be done? Won't that effect would we do that if the elements on these boundaries are the ones which were swapped in BST illegally? – Cipher Sep 01 '11 at 05:37
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/3048/discussion-between-cipher-and-templatetypedef) – Cipher Sep 01 '11 at 05:37
0

I guess your getMin et getMax works witht the hypotheses that the tree is a BST, so

T getMax(tree) {
  return tree -> right == null 
    ? tree -> value 
    : getMax(tree -> right);
}

(or the equivalent code with a loop). If so, your code examines at most three values in the tree. Even if getMax and getMin traversed the full tree to get the actual max/min, you would still base your test on just two comparisons. If you want to check that your tree satisfy the BST property, it is obvious that you have to examine all the values. Its enough to compare each node with its parent.

void CheckIsBst(Tree *tree) {
  if (tree -> left != null) {
    if (tree -> left -> value > tree -> value) {
      // print violation
    }
    CheckIsBst(tree -> left);   
  }
  // same with -> right, reversing < to > in the test
}

Edit : that was wrong, see comment. I believe this one is ok.

void checkIsBst(Tree *Tree, Tree *lowerBound, Tree *upperBound) {
  if(lowerBound!= null && lowerBound -> value > tree -> Value) {
    //violation
  }
  // same for upper bound, check with <
  if (tree -> left != null) {
    if (tree -> left -> value > tree -> value) {
      // print violation
     }
     CheckIsBst(tree -> left, lowerBound, tree);   
  }
  // same for right, reversing comparison 
  // setting lowerBound to tree instead of upperBound
}

Calling from root with null bounds

Didier Dupont
  • 29,398
  • 7
  • 71
  • 90
  • In the above tree, check that 12<17 and 16>10 and they hence case of checking node with it's parent is valid, but it is still not a bst. What say? – Cipher Aug 31 '11 at 17:06
  • If the approach in my post is followed, tree upto level can return correct answer but four level tree won't give that. Any solution? – Cipher Aug 31 '11 at 17:18
  • I say you are quite right. Bounds implied by all ancestors have to be checked to. – Didier Dupont Aug 31 '11 at 17:24
  • Are you sure it is ok beyond depth 2? what of 2 is left of 5 and 8 is right of 2? (depth 3, min is 2, max returns 5, looks ok, yet 8 left of 5) – Didier Dupont Aug 31 '11 at 17:28
0

the tree traversal done by templatetypedef works if you are sure there is only one swap. Otherwise I suggest a solution based on your initial code:

int GetMax(TreeNode* tree) {
    int max_right, max_left, ret;

    ret = tree->data;
    if (tree->left != NULL) {
        max_left = GetMax(tree->left);
        if (max_left > ret)
            ret = max_left;
    }
    if (tree->right != NULL) {
        max_right = GetMax(tree->right);
        if (max_right > ret)
            ret = max_right;
    }

    return ret;
}

int GetMin(TreeNode* tree) {
    int min_right, min_left, ret;

    ret = tree->data;
    if (tree->left != NULL) {
        min_left = GetMin(tree->left);
        if (min_left < ret)
            ret = min_left;
    }
    if (tree->right != NULL) {
        min_right = GetMin(tree->right);
        if (min_right < ret)
            ret = min_right;
    }

    return ret;
}

void print_violations(TreeNode* tree) {
    if ((tree->left != NULL) && (tree->right != NULL)) {
        int max_left = GetMax(tree->left);
        int min_right = GetMin(tree->right);
        if (max_left > tree->data && min_right < tree->data) {
            printf("Need to swap %d with %d\n", max_left, min_right);
        }
    }
    if (tree->left != NULL)
        print_violations(tree->left);
    if (tree->right != NULL)
        print_violations(tree->right);
}

It is slower, but it prints all the swaps it identifies. Can be changed to print all violations (e.g. if (max_left > tree->data) print violation). You can improve the performance if you can add two fields to the TreeNode with the max and min precomputed for that subtree.

andredor
  • 542
  • 3
  • 4