0

I only need to retrieve the 3 smallest elements, and wondering if there is a way to improve my below code to keep heap size smaller -- I think if we only need to keep heap size as 3, it is enough. But cannot find an option in heapq to tweak.

In other words, I want to want to maintain a three element heap that is occasionally updated.

import heapq

def heapsort(iterable):
   h = []
   for value in iterable:
       heapq.heappush(h, value)
   return [heapq.heappop(h) for i in range(len(h))]

if __name__ == "__main__":

   print heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 2
    Isn't this the same code as in the [python docs](https://docs.python.org/2/library/heapq.html#basic-examples) ? –  Dec 05 '15 at 03:48
  • @kiran.koduru, yes, any inputs are appreciated to reduce heapsize. :) – Lin Ma Dec 05 '15 at 03:49
  • 1
    Why do you need to make the heap yourself? Getting the X smallest elements is what [`heapq.nsmallest`](https://docs.python.org/3/library/heapq.html#heapq.nsmallest) is for. – ShadowRanger Dec 05 '15 at 03:59
  • @ShadowRanger, since I need to ad-hoc add elements (elements are input dynamically which I cannot predict in advance and cannot predict at one time), but at any given time, I may need to get the 3 smallest elements. Your comments are appreciated. – Lin Ma Dec 05 '15 at 05:16

1 Answers1

1

The way to improve the code to only get the three smallest elements is to replace it with heapq.nsmallest:

print heapq.nsmallest(3, [1, 3, 5, 7, 9, 2, 4, 6, 8, 0])

Output:

[0, 1, 2]

You can look at the implementation of nsmallest if you're curious about how you'd build it from the heapq primitive functions, because they did exactly that.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Thanks ShadowRanger. Actually my question is, how to optimize your code above to let heap only use 3 elements internally since the final output just need 3 elements, and I think in your code sample, the heap internally still maintain the same size of elements in list (which is len(list)), other than 3 Please feel free to correct me if I am wrong. – Lin Ma Dec 05 '15 at 04:18
  • And ShadowRanger, another confusion in your sample is, how do dynamically add elements, other than provide a static list at the beginning? And when dynamically add elements, I still prefer the heap internally only maintain 3 elements to save internal space. – Lin Ma Dec 05 '15 at 04:20
  • 1
    @LinMa: Nope. However much input `nsmallest` takes, the internal heap is only three elements long. As for "how do I populate a list", that's something so simple and fundamental to Python that if you can't do it, you have no business trying to implement variations on heap sort or the like. You need to run through the [Python tutorial](https://docs.python.org/2/tutorial/), now; if you don't know how to build a list, or replace a list literal with a dynamically constructed list, you are going to have huge holes in your understanding. – ShadowRanger Dec 05 '15 at 04:29
  • 1
    I'll note, if you want to maintain a three element heap that is occasionally updated, `nsmallest` isn't meant to efficiently do that on its own, but that's why I linked you to the implementation of `nsmallest` in the answer; it shows you exactly how they do it all at once, and if you can read Python code, you can adapt that to piecemeal updates (hint: once you've got N elements, take a look at `heapq`'s `_heapify_max` and `_heappushpop_max` to maintain it; sadly not part of the public API, so using them isn't strictly supported). – ShadowRanger Dec 05 '15 at 04:34
  • Thanks ShadowRanger, you have got my exact pain points -- "want to maintain a three element heap that is occasionally updated", I read the implementation reference and have one more question, with current public interface from heapq, what do you think the most efficient implementation to keep only three elements? Or you think we should choose another data structure. – Lin Ma Dec 05 '15 at 05:15
  • Thanks ShadowRanger, for your comments, "However much input nsmallest takes, the internal heap is only three elements long", is there any solution which nsmallest could continue to get input (like a streaming way), while internally only keep 3 elements? – Lin Ma Dec 05 '15 at 05:19
  • 1
    @LinMa: You wouldn't be able to use the public interface for this. The public `heapq` interface is a minheap, and the implementation of `nsmallest` requires a maxheap. I'd just use the private interfaces personally, or copy out their implementation so you're not dependent on them directly. – ShadowRanger Dec 05 '15 at 15:10
  • Thanks ShadowRanger, is there any other data structure you think I can leverage in Python other than heapq (Python 2.x preferred), to implement n-smallest which is more elegant? Have a good weekend. – Lin Ma Dec 06 '15 at 00:24
  • Thanks for all the help ShadowRanger, mark your reply as an answer. :) – Lin Ma Dec 21 '15 at 00:06