0

I have a simple algorithm task for which I'd like to practice my complexity analysis, and I was hoping to get some confirmation that I'm correct.

The task is; implement a function that takes an array of numbers of size n, and returns the k highest values. The implementation should be time and space efficient.

Here is my pseudo-code:

  1. Create a binary min heap of size k + 1.
  2. For each number in the array;
    • Push the value onto the heap.
    • If the heap size is now larger than k, pop the minimum value.
  3. Pop all values from the heap and return the results array.

Here is my time complexity analysis for each step:

  1. Negligible.
  2. O(n)
    • O(log n)
    • Negligible
  3. O(k)

So total complexity should be O(n.log n + k)? Space complexity should be O(k + 1)?

Also, any critique of my method welcomed.

Thanks in advance!

Jonathan Crooke
  • 912
  • 7
  • 19
  • 2
    The heap is of size k, so pushing onto it should only be O(log k). – harold Nov 09 '14 at 20:31
  • Ah yes! Great, forgot this in the type up. So total complexity `O(n.log k + k)`? – Jonathan Crooke Nov 09 '14 at 20:32
  • 1
    O(n\*log(k) + k) is correct, but not as simple as possible. Notice that k <= n, so anything that is O(n\*log(k) + k) is also O(n\*log(k) + n). And then the n term is dominated by the n\*log(k) term, so you can just write O(n\*log(k)). – j_random_hacker Nov 09 '14 at 20:35
  • Ah, ok. So I should be considering the worst case whereby `k = n`. Great. Also, I'm not that confident as to when I can just neglect terms... Any simple way to generalise this? – Jonathan Crooke Nov 09 '14 at 20:39
  • O(a+b) = O(a) if b = O(a). Also O(a*b) = O(a^2) if b = O(a). – Niklas B. Nov 09 '14 at 20:42
  • 1
    It's not that you need to consider the worst case: if there was no n\*log(k) term hanging around to absorb the k term, it would be best to leave it as k. (E.g. if your algorithm was O(k), then it's also O(n), but O(k) is a tighter bound so you're better off describing it as O(k).) It's just that since there *is* such a term in this case, you don't *lose anything* by considering the worst case for k -- and you get a simpler expression. – j_random_hacker Nov 09 '14 at 20:45
  • Ok, well, my understanding up to this point is just that the `+ k` term is relatively insignificant compared to the `n log k` term, and so it can be omitted. The goal of the analysis isn't to be exhaustive, right? Rather to give a rough characteristic of the algorithm to allow comparison with others. ie. my algorithm is `O(n log k)`, and is therefore superior to a `O(n^2)` solution, for example. Looking for some rules of thumb to keep in mind here. – Jonathan Crooke Nov 09 '14 at 21:20
  • 1
    Big-O notation doesn't describe performance precisely, but that's something different. Saying "O(n\*log(k) + k)" is like saying "O(n + n)" or "O(3n + 57log(n) + 8)" or "I live at 2*50-8 Smith Street" -- it's an unnecessarily complicated way of communicating a piece of information that only makes it harder to make comparisons. – j_random_hacker Nov 09 '14 at 22:31
  • But you're absolutely right that the +k term is relatively insignificant compared to the n\*log(k) term in this expression, and that's why it can be dropped. – j_random_hacker Nov 09 '14 at 22:32
  • Ok, great. So considering that `n ~ k` my final answer ought to be `O(n log n)`, since the key part of the algorithm is that we perform a `log n` operation on each item in the array. This now makes a lot more sense since these simplified statements come up more often in algorithm comparison, rather than the "full" statement. – Jonathan Crooke Nov 10 '14 at 08:28
  • 1
    That's not quite right. k *can* be much smaller than n, and when it is, O(n\*log k) is smaller than O(n\*log n), so while both expressions are true of your algorithm, it's better to use the former -- it gives more information. (For the same reason, O(n\*log n) gives more information about your algorithm than O(n^2), or O(n^52), even though all three would be true statements.) The point about not writing O(n\*log(k) + k) is that doing so is *exactly equivalent to, and more complicated than* writing O(n\*log(k)). – j_random_hacker Nov 11 '14 at 00:35
  • Great! Makes perfect sense. Thanks again for all your help, and take some up-votes ;) – Jonathan Crooke Nov 11 '14 at 08:32
  • You're welcome! I'm glad it clicked for you in the end :) – j_random_hacker Nov 11 '14 at 20:27
  • Another question; if I had some algorithm that required two linear searches (for example), this would have a *precise* complexity of `2n`. However, is it correct that since we are more interested in the *order* of complexity, in comparison to other algorithms, rather than accuracy, it would be better to express it as simply `n` complexity? This communicates that it's better than some `n^2` algorithm, but not better than a `log n`...? – Jonathan Crooke Nov 15 '14 at 15:08

0 Answers0