2

I saw a very short algorithm for merging two binary search trees. I was surprised how easy and also very inefficient it is. But when I tried to guess its time complexity, I failed.

Lets have a two immutable binary search trees (not balanced) that contains integers and you want to merge them together with the following recursive algorithm in pseudo code. Function insert is auxiliary:

function insert(Tree t, int elem) returns Tree:
    if elem < t.elem:
        return new Tree(t.elem, insert(t.leftSubtree, elem), t.rightSubtree)
    elseif elem > t.elem:
        return new Tree(t.elem, t.leftSubtree, insert(t.rightSubtree, elem))
    else
        return t

function merge(Tree t1, Tree t2) returns Tree:
    if t1 or t2 is Empty:
        return chooseNonEmpty(t1, t2)
    else
        return insert(merge(merge(t1.leftSubtree, t1.rightSubtree), t2), t1.elem)

I guess its an exponencial algorithm but I cannot find an argument for that. What is the worst time complexity of this merge algorithm?

  • 1
    Why exactly do you say the algo is ineffective? – displayName Jan 31 '18 at 17:46
  • @displayName Insert one tree into another would be O(n^3). But this is maybe like O(n^n), because for each node it traverses a whole tree again to the bottom. And then all once again (second call of merge)… – Martin Jiřička Jan 31 '18 at 17:58
  • @greybeard Yes, "inefficient" would be a better word, I changed the title. I used the word "effective" within the meaning of "available for useful work", not as a term of computability theory. – Martin Jiřička Feb 02 '18 at 22:22

2 Answers2

1

Let's consider the worst case:

At each stage every tree is in the maximally imbalanced state, i.e. each node has at least one sub-tree of size 1.

In this extremal case the complexity of insert is quite easily shown to be Ө(n) where n is the number of elements in the tree, as the height is ~ n/2.


Based on the above constraint, we can deduce a recurrence relation for the time complexity of merge:

enter image description here

where n, m are the sizes of t1, t2. It is assumed without loss of generality that the right sub-tree always contains a single element. The terms correspond to:

  • T(n - 2, 1): the inner call to merge on the sub-trees of t1
  • T(n - 1, m): the outer call to merge on t2
  • Ө(n + m): the final call to insert

To solve this, let's re-substitute the first term and observe a pattern:

enter image description here

We can solve this sum by stripping out the first term:

enter image description here

Where in step (*) we used a change-in-variable substitution i -> i + 1. The recursion stops when k = n:

enter image description here

T(1, m) is just the insertion of an element into a tree of size m, which is obviously Ө(m) in our assumed setup.

Therefore the absolute worst-case time complexity of merge is

enter image description here


Notes:

  • The order of the parameters matters. It is thus common to insert the smaller tree into the larger tree (in a manner of speaking).
  • Realistically you are extremely unlikely to have maximally imbalanced trees at every stage of the procedure. The average case will naturally involve semi-balanced trees.
  • The optimal case (i.e. always perfectly balanced trees) is much more complex (I am unsure that an analytical solution like the above exists; see gdelab's answer).

EDIT: How to evaluate the exponential sum

Suppose we want to compute the sum:

enter image description here

where a, b, c, n are positive constants. In the second step we changed the base to e (the natural exponential constant). With this substitution we can treat ln c as a variable x, differentiate a geometrical progression with respect to it, then set x = ln c:

enter image description here

But the geometrical progression has a closed-form solution (a standard formula which is not difficult to derive):

enter image description here

And so we can differentiate this result with respect to x by n times to obtain an expression for Sn. For the problem above we only need the first two powers:

enter image description here

So that troublesome term is given by:

enter image description here

which is exactly what Wolfram Alpha directly quoted. As you can see, the basic idea behind this was simple, although the algebra was incredibly tedious.

meowgoesthedog
  • 14,670
  • 4
  • 27
  • 40
  • Whoa, thank you very much for your solution! Clever handling of all those sums! I got stuck at second equation from the back, where you are getting rid of sum of 2^(j-1) * Omega(…) between 1 and n-2. I have no clue from where you got 11*2^(n-2)+…. – Martin Jiřička Feb 01 '18 at 22:43
  • @MartinJiřička ah yes, that sum can be quite easily done with a mathematical trick, but I was too lazy to do it explicitly so I just used Wolfram Alpha :D sorry for the confusion; if you would like me to include how such a sum can be computed please let me know – meowgoesthedog Feb 01 '18 at 23:49
  • Yes, I am interested! You can only post a name of the trick if it has some, I will try to compute it myself. Thank you! (And if it will be correct, I would mark your solution as answer ;-D) – Martin Jiřička Feb 02 '18 at 19:15
  • @MartinJiřička done. Let me know if there is anything you still don't understand – meowgoesthedog Feb 02 '18 at 21:04
  • I apologize it took me so long to review your answer. To be honest I wasn't able to follow your solution completely, it is too high math for me. Anyway, thank you for your explanation! – Martin Jiřička Mar 17 '18 at 21:10
0

It's quite hard to compute exactly, but it looks like it's not polynomially bounded in the worst case (this is not a complete proof however, you'd need a better one):

  • insert has complexity O(h) at worst, where h is the height of the tree (i.e. at least log(n),possibly n).

  • The complexity of merge() could then be of the form: T(n1, n2) = O(h) + T(n1 / 2, n1 / 2) + T(n1 - 1, n2)

  • let's consider F(n) such that F(1)=T(1, 1) and F(n+1)=log(n)+F(n/2)+F(n-1). We can probably show that F(n) is smaller than T(n, n) (since F(n+1) contains T(n, n) instead of T(n, n+1)).

  • We have F(n)/F(n-1) = log(n)/F(n-1) + F(n/2) / F(n-1) + 1

  • Assume F(n)=Theta(n^k) for some k. Then F(n/2) / F(n-1) >= a / 2^k for some a>0 (that comes from the constants in the Theta).

  • Which means that (beyond a certain point n0) we always have F(n) / F(n-1) >= 1 + epsilon for some fixed epsilon > 0, which is not compatible with F(n)=O(n^k), hence a contradiction.

  • So F(n) is not a Theta(n^k) for any k. Intuitively, you can see that the problem is probably not the Omega part but the big-O part, hence it's probably not a O(n) (but technically we used the Omega part here to get a). Since T(n, n) should be even bigger than F(n), T(n, n) should not be polynomial, and is maybe exponential...

But then again, this is not rigorous at all, so maybe I'm actually dead wrong...

gdelab
  • 6,124
  • 2
  • 26
  • 59
  • Hmm, I will need some time to chew it over… Theta and Omega are well known functions from complexity theory? (I haven't listened about them.) – Martin Jiřička Jan 31 '18 at 18:04
  • Omega is like the reciprocal of big-O, and F=Theta(g) iff F=O(g) and F=Omega(g). See [here](https://www.khanacademy.org/computing/computer-science/algorithms/asymptotic-notation/a/big-big-omega-notation) for instance. – gdelab Feb 01 '18 at 08:16