13

In researching complexity for any algorithm that traverses a binary search tree, I see two different ways to express the same thing:

Version #1: The traversal algorithm at worst case compares once per height of the tree; therefore complexity is O(h).

Version #2: The traversal algorithm at worst case compares once per height of the tree; therefore complexity is O(logN).

It seems to me that the same logic is at work, yet different authors use either logN or h. Can someone explain to me why this is the case?

aarti
  • 2,815
  • 1
  • 23
  • 31
Stephen Gross
  • 5,274
  • 12
  • 41
  • 59

5 Answers5

15

The correct value for the worst-case time to search is tree is O(h), where h is the height of a tree. If you are using a balanced search tree (one where the height of the tree is O(log n)), then the lookup time is worst-case O(log n). That said, not all trees are balanced. For example, here's a tree with height n - 1:

1
 \
  2
   \
    3
     \
     ...
       \
        n

Here, h = O(n), so the lookup is O(n). It's correct to say that the lookup time is also O(h), but h ≠ O(log n) in this case and it would be erroneous to claim that the lookup time was O(log n).

In short, O(h) is the correct bound. O(log n) is the correct bound in a balanced search tree when the height is at most O(log n), but not all trees have lookup time O(log n) because they may be imbalanced.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
8

If your binary tree is balanced so that every node has exactly two child nodes, then the number of nodes in the tree will be exactly N = 2h − 1, so the height is the logarithm of the number of elements (and similarly for any complete n-ary tree).

An arbitrary, unconstrained tree may look totally different, though; for instance, it could just have one node at every level, so N = h in that case. So the height is the more general measure, as it relates to actual comparisons, but under the additional assumption of balance you can express the height as the logarithm of the number of elements.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    An important but nitpicky point - not all balanced trees have each node with two children, and having each internal node have two children does not guarantee balance. Many balanced trees (AVL trees, Red/Black trees, AA-trees, RAVL trees, etc.) do not obey this condition but still have O(log n) lookups, where the log is not base-2. – templatetypedef Feb 03 '12 at 21:12
  • 2
    @templatetypedef: that's because they aren't "balanced so that every node has exactly two child nodes". An AVL tree is maximally balanced for its size, the invariant is that no node has subtrees whose height differs by more than one. IIRC for Red-Black trees the constraint is that no node has subtrees whose height varies by more than a factor of 2 -- they are "less balanced" than AVL trees, but nevertheless still "balanced" enough for `h` to be `O(log n)`. The `log` in `O(log n)` can still be base 2, or any other base, since the choice of base only makes a constant-factor difference. – Steve Jessop Feb 03 '12 at 22:11
  • I also omitted for simplicity that the last row of the tree doesn't have to be full, of course, and you're right that intermediate rows also don't have to have a definite size, but rather, their number of elements should be bounded above and below. Ultimately what matters is that `h = O(log N)` (and arbitrary arity is subsumed in the constants). – Kerrek SB Feb 03 '12 at 22:24
3

O(h) would refer to a binary tree that is sorted but not balanced

O(logn) would refer to a tree that is sorted and balanced

hrezs
  • 782
  • 1
  • 8
  • 23
1

It's sort of two ways of saying the same thing, because your average balanced binary tree of height 'h' will have around 2^h nodes.

Depending on the context, either height or #nodes may be more relevant, and so that's what you'll see referenced.

dkamins
  • 21,450
  • 7
  • 55
  • 59
0

because the (h)eight of a balanced tree varies as the log of the (N)umber of elements

Gus
  • 6,719
  • 6
  • 37
  • 58
  • Not always; imbalanced trees can have h = Theta(n). – templatetypedef Feb 03 '12 at 20:13
  • I sort of assumed he was refering to the balanced case and that the question arose because he saw different notation for the same case. logN is clearly the balanced case. Edited for clarity anyway. – Gus Feb 03 '12 at 20:15
  • Actually in re-reading the question, both mention "worst case" but in worst case h = N and so one of the two statements he is quoting is likely wrong. If h == N the statements lead to h == log(h) which is not possible – Gus Feb 03 '12 at 20:20