55

n log n > n -- but this is like a pseudo-linear relationship. If n=1 billion, log n ~ 30;

So n log n will be 30 billion, which is 30 X n, order of n. I am wondering if this time complexity difference between n log n and n are significant in real life.

Eg: A quick select on finding kth element in an unsorted array is O(n) using quickselect algorithm.

If I sort the array and find the kth element, it is O(n log n). To sort an array with 1 trillion elements, I will be 60 times slower if I do quicksort and index it.

mrj
  • 849
  • 2
  • 8
  • 18
brain storm
  • 30,124
  • 69
  • 225
  • 393
  • 3
    So what exactly do you want answered? Whether time complexity matters in real life? From some value of `n`, it certainly does. – Simeon Visser Jan 31 '14 at 20:47
  • 6
    There is a difference when you have to wait 30 seconds instead of just one second. – Gumbo Jan 31 '14 at 20:47
  • 1
    Look at [vectorization](http://en.wikipedia.org/wiki/Vectorization_%28parallel_computing%29): intricate algorithms and expensive hardware, all for a x4 or x8 increase in performance – anatolyg Jan 31 '14 at 20:56
  • 4
    If you only need to find the *kth* element once, then by all means use quickselect. If you're going to be doing it often on that data set, sorting will give the advantage in the future. It may be 60 times slower the first time, but each time after that it will be O(1). – Geobits Jan 31 '14 at 21:24
  • 30 times faster is a big difference. Its the difference between a program taking 2 seconds to calculate a result you want and 1 minute. I bet if you were sitting in front of the computer waiting for the calculation to complete, you would think that waiting 2 seconds vs waiting a minute was in fact a "big difference". – Peter Webb Feb 01 '14 at 01:53
  • nothing new to add but just a silly suggestion: I always thought that in big O notation log(N) means logarithm in general not log10(N) or ln(N) so log(30 billion) could be much much more than 30 ... what if the base is 1.0001 ??? Or am I wrong and log means log10 in big O ? – Spektre Feb 01 '14 at 07:39
  • 2
    oh and btw when you choose between algorithms like O(N) and O(N.log(N)) for the same thing then in my experience the worser complexity has usually faster cycle runtime (it is less complex code usually) so its always a good idea to actually measure the treshold N and choose the algorithm acording to used N (you can do it even at runtime) – Spektre Feb 01 '14 at 07:45
  • look at this plot in wolfram alpha and you will see significant difference! http://www.wolframalpha.com/input/?i=y+%3D+x+log+x+%2C+y+%3D+x+%2Cx%3D0+to+100 – M.kazem Akhgary Oct 20 '15 at 14:55
  • > There is a difference when you have to wait 30 seconds instead of just one second. . then why do we drop constants at all? – Coder Nov 14 '20 at 06:02

5 Answers5

65

The main purpose of the Big-O notation is to let you do the estimates like the ones you did in your post, and decide for yourself if spending your effort coding a typically more advanced algorithm is worth the additional CPU cycles that you are going to buy with that code improvement. Depending on the circumstances, you may get a different answer, even when your data set is relatively small:

  • If you are running on a mobile device, and the algorithm represents a significant portion of the execution time, cutting down the use of CPU translates into extending the battery life
  • If you are running in an all-or-nothing competitive environment, such as a high-frequency trading system, a micro-optimization may differentiate between making money and losing money
  • When your profiling shows that the algorithm in question dominates the execution time in a server environment, switching to a faster algorithm may improve performance for all your clients.

Another thing the Big-O notation hides is the constant multiplication factor. For example, Quick Select has very reasonable multiplier, making the time savings from employing it on extremely large data sets well worth the trouble of implementing it.

Another thing that you need to keep in mind is the space complexity. Very often, algorithms with O(N*Log N) time complexity would have an O(Log N) space complexity. This may present a problem for extremely large data sets, for example when a recursive function runs on a system with a limited stack capacity.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
39

It depends.

I was working at amazon, there was a method, which was doing linear search on a list. We could use a Hashtable and do the look up in O(1) compared to O(n).

I suggested the change, and it wasn't approved. because the input was small, it wouldn't really make a huge difference.

However, if the input is large, then it would make a difference.

In another company, where the data/input was huge, using a Tree, Compared to List made a huge difference. So it depends on the data and architecture of the application.

It is always good to know your options and how you can optimize.

DarthVader
  • 52,984
  • 76
  • 209
  • 300
14

There are times when you will work with billions of elements (and more), where that difference will certainly be significant.

There are other times when you will be working with less than a thousand elements, in which case the difference probably won't be all that significant.

If you have a decent idea what your data will look like, you should have a decent idea which one to pick from the start, but the difference between O(n) and O(n log n) is small enough that it's probably best to start off with whichever one is simplest, benchmark it and only try to improve it if you see it's too slow.

However, note that O(n) may actually be slower than O(n log n) for any given value of n (especially, but not necessarily, for small values of n) because of the constant factors involved, since big-O ignores those (it only considers what happens when n tends to infinity), so, if you're looking purely at the time complexity, what you think may be an improvement may actually slow things down.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
4

Darth Vader is correct. It always depends. Its also important to rememeber that complexities are asymptotic, worst-case (usually) and that constants are dropped. Each of these is important to consider.

So you could have two algorithms, one of which is O(n) and one of which is O(nlogn), and for every value up to the number of atoms in the universe and beyond (to some finite value of n), the O(nlogn) algorithm outperforms the O(n) algorithm. It could be because lower order terms are dominating, or it could be because in the average case, the O(nlogn) algorithm is actually O(n), or because the actual number of steps is something like 5,000,000n vs 3nlogn.

gms7777
  • 442
  • 4
  • 12
0

PriorityQueue Sorts each element that you add each time while using Collections.sort() will sort all the elements in a single go. But if you have a problem where you want to get the biggest element as soon as possible use PriorityQueue on the other hand if you need to perform some computations but requires the element to be sorted then using ArrayList with Collections.Sort is best

Amit_Hora
  • 716
  • 1
  • 8
  • 27