Performance comparison of binary search tree functions

Question

I have these two binary search tree search functions. My question is which one of these functions are more efficient, or if they are equivalent?

Given

type 'a tree = 
  | Empty
  | Node of 'a * 'a tree * 'a tree

let rec search x t =
  match t with
  | Empty -> Empty
  | Node(a, left, right) as t' ->
      if a = x then t'
      else 
        match search x left with
        | Empty -> search x right
        | t'' -> t''

let rec search x tree = 
  match tree with 
  | Empty -> Empty
  | Node (root, left, right) as t -> 
      if (x = root) then t 
      else if (x < root) then search x left
      else search x right

I think that the second one is equivalent simply because we don't need to have another match with Empty since we already have it before.

Chris · Accepted Answer · 2023-04-10T04:50:23.470

2

It depends

This really depends on whether values in your binary tree are strictly ordered. Your first function will search the right branch if the value is not found in the left branch. The second will only follow either the left or the right branch, but not both.

They are not equivalent.

Performance and stack considerations

If we assume strict ordering then the first function will be less efficient as it will search all nodes. Performance will be O(n) while with the second function, performance will be O(log n). The first function is also not tail-recursive, so you run the risk of a stack overflow for a very large tree.

If you want to make this tail-recursive, one approach is to maintain a stack or worklist of nodes to deal with and work through them as we go. We search left branches first, pushing the right branches onto the stack. Once we hit an Empty left branch, we start consuming the stack.

let rec search x t stack =
  match t, stack with
  | Empty, [] -> Empty
  | Empty, n::ns -> search x n ns
  | Node (v, _, _), _ when x = v -> t
  | Node (_, l, r), _ -> search x l @@ r :: stack

We can hide the stack argument by making this a local function.

let search x t =
  let rec search' x t stack =
    match t, stack with
    | Empty, [] -> Empty
    | Empty, n::ns -> search' x n ns
    | Node (v, _, _), _ when x = v -> t
    | Node (_, l, r), _ -> search' x l @@ r :: stack
  in
  search' x t []

A further suggestion

If we can assume strict ordering in the tree, I would probably suggest the following, using conditional guards.

let rec search x t =
  match t with
  | Empty -> Empty
  | Node (v, _, _) when x = v -> t
  | Node (v, l, _) when x < v -> search x l
  | Node (_, _, r) -> search x r

Which we can reduce to:

let rec search x = function
  | Empty -> Empty
  | Node (v, _, _) as t when x = v -> t
  | Node (v, l, _) when x < v -> search x l
  | Node (_, _, r) -> search x r

edited Apr 10 '23 at 04:50

answered Apr 10 '23 at 03:46

Chris

26,361
5
21
42

But how can with your code for the strictly ordered BST to match multiple matches? Isn't match/with acting like if else? – v_head Apr 10 '23 at 05:13
`match` is _not_ if/else. The former allows for concisely destructuring data structures. The latter deals with boolean conditions. Both are control flow mechanisms, and anything that can be written using one can be written using the other, but they do it in different ways. – Chris Apr 10 '23 at 05:30
I'm also not sure what you mean by multiple matches. Do you want to return all matching nodes in a list? A strictly ordered BST generally shouldn't have multiple nodes with the same value. – Chris Apr 10 '23 at 05:32
I see, so your last suggestion of conditional guards is an alternative to my 2. function in case of strict ordering? Why prefer that to if else if else? – v_head Apr 10 '23 at 05:43
Perhaps a bit of personal preference. I find less nesting easier to mentally parse. Also, being able to use `_` wildcard patterns tells my practiced eyes which pieces of data actually matter in each case. – Chris Apr 10 '23 at 05:45
that s fair. I agree. – v_head Apr 10 '23 at 14:56
The 1. would work for binary tree (not search, so not ordering) how would you optimize that function to be more readable? Because for binary trees, the time complexity would be O(n) surely – v_head Apr 10 '23 at 14:58
In your first function, the `as t'` is redundant. You have access to that value through the `t` binding. You could have that pattern be: `Node (a, _, _) when a = x -> t` and then after that `Node (_, l, r) -> ...` – Chris Apr 10 '23 at 19:18
Right, I asked a separate question https://stackoverflow.com/staging-ground/75980345 – v_head Apr 10 '23 at 20:32

Performance comparison of binary search tree functions

1 Answers1

It depends

Performance and stack considerations

A further suggestion