2

I have been taking the DSA course on Coursera and this week have been introduced to searching algorithms. While the complexity of binary search(O(logn)) is better than linear search (O(n)). But why would I ever use it in an unsorted array given the fact that it would take nlogn work to sort the array first.

If binary search is only used where array is already sorted, then why are these two algorithms compared so often because clearly they have different use cases.

  • 1
    Because after one sorting step *O(n log n)*, you can make thousands of *queries*. This is basically what a database index is used for. Note that adding an element to a AVL tree, takes *O(log n)*. – Willem Van Onsem Feb 03 '20 at 21:38
  • @WillemVanOnsem: I believe that applies best when teh number of queries is >n (acknowledging that >n is fuzzily defined with regards to algorithmic complexity) – Mooing Duck Feb 03 '20 at 21:39
  • 1
    @MooingDuck: well from the moment the number of queries is greater than *O(log n)*, one expects a performance gain. – Willem Van Onsem Feb 03 '20 at 21:40

2 Answers2

5

would I ever use it in an unsorted array given the fact that it would take O(n log n) work to sort the array first.

Normally one performs multiple queries on the same data structure. Indeed, look for example at a database. It makes sense that one will more often fetch a record with a given primary key, than that one will add records. That makes sense, since if the number of queries was lower than the number of inserts, then we made inserts of data that are never retrieved, and these are thus "useless".

Furthermore sorting a list of elements, or constructing a binary tree of elements takes indeed O(n log n). But updating a binary search tree, like for example an an AVL tree [wiki] takes O(log n). So if the you slightly change the collection of elements, by adding one element, removing one element, updating one element, etc. It requires O(log n) to alter the datastructure, and you keep maintaining the O(log n) lookup.

Using linear search on unsorted data, will indeed outperform sorting and binary search for a small number of queries. From the moment the number of queries becomes large, the linear search algorithm will be outperformed by the binary search algorithm.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
2

Willem Van Onsem's answer describes well the case where many queries will be made on the same array, so it's worth taking O(n log n) time to sort the array first. My answer doesn't directly address "unsorted arrays", but there is a common misconception that arrays either are unsorted or have been sorted, and I think it is worth addressing that misconception in case it helps any readers.

To be clear, I don't assume that you have this particular misconception; but I do think some people who have this misconception will read your question and its answers.


The word "sorted" is a bit misleading. Since "sorted" is a past tense verb, it makes it sound like a sorting algorithm has been used to put the data in order. But the way computer scientists use the word "sorted", it just means that the array is in order, without implying that it was previously not in order.

So when we say binary search can only be used on a "sorted array", that doesn't mean it took O(n log n) time to make the array "sorted". Lots of data is naturally in order without having to do any work to sort it. A few examples:

  • Suppose I have an unsorted array of numbers, and I want to build a prefix sum array, which contains the cumulative sums starting from the beginning of the original array. If there are no negative numbers in the original array, then the cumulative sums will naturally be in ascending order.
  • Suppose I have a sequence with some special elements, and I want to perform queries where, given an index, the query finds the first special element after that index. It would help to have a list of the indices in the sequence where the special elements occur; the natural way to find those indices would find them in ascending order.
  • Suppose I want an array of the first n prime numbers, or all prime numbers less than or equal to n. Almost any algorithm that solves either problem will generate the prime numbers in ascending order.

So in many cases, we can apply binary search without having to take O(n log n) time to sort the sequence that needs to be searched.

kaya3
  • 47,440
  • 4
  • 68
  • 97