0

Like the title says, I would like to know if python's heapq.heapify() will work faster on a list that is close to a heap or does it do the entire operation element by element on every list?

I'm debating on how often to use heapify().

  • 1
    You can see the source [here](https://github.com/python/cpython/blob/39a54ba63850e081a4a5551a773df5b4d5b1d3cd/Lib/heapq.py#L170). It's O(n) as the docs say, but [`_siftup`](https://github.com/python/cpython/blob/39a54ba63850e081a4a5551a773df5b4d5b1d3cd/Lib/heapq.py#L260) "bubbles", so it should be faster the less bubbling it needs to do. – Carcigenicate Dec 25 '22 at 19:26
  • 2
    You should only need to call `heapify` once: after that only use the `heappush` and `heappop` methods (and the like) so that your heap will *stay* a heap. If you need to "debate on how often to use heapify" you're probably doing it wrong. – trincot Dec 25 '22 at 20:40
  • @trincot I don't think that's correct advice. For example, if you have a heap of size *n* and you have to add *n* more elements to the heap, it is asymptotically more efficient to push them all to the list and call heapify once, than it is to call heappush *n* times. Any use-case where you have to do batch insertions will have a trade-off like this, and it will take some analysis or experimentation to figure out when a batch insertion is better to do with one heapify vs. some number of heappushes. – kaya3 Dec 28 '22 at 22:48

1 Answers1

0

The obvious answer is yes. If you supply a sorted array to heapify it won't have to perform any swaps at all. If you supply a reverse-sorted array it will have to perform the maximum number of swaps.

That said, there is no benefit to pre-sorting the array before passing it to heapify because the total time (i.e. analyzing and arranging the array, plus heapify time) will exceed the maximum time required for heapify to do its work on even the worst-case arrangement.

That said, you shouldn't have to call heapify more than once. That is, you call heapify to construct the heap. Then you call the heappush and heappop methods to add and remove items.

I suppose, if you have to add a large number of items to an existing heap, you could append them to an existing heap and then call heapify to re-build the heap. Hard to say the exact circumstances under which that would be useful. I'd certainly give any such code a big ol' WTF if I were to see it in a code review.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Using heapify for batch insertions will have an asymptotic advantage when the batch size is on the order of `n / log n`. That could be like inserting 50,000 elements into a priority queue of size 1,000,000 - it's not that implausible for some specific applications. – kaya3 Dec 28 '22 at 23:00
  • @kaya3 Not implausible, but definitely a special case. And something I'd be very careful about advocating. – Jim Mischel Dec 29 '22 at 01:52