0

I'm looking to improve the speed at which I can remove (pop) the smallest item from a list or array while also adding items on the fly. The maximum number of items is fixed, so I could use initialized numpy arrays but so far I've seen the best performance with heapq. Below is my implementation is Cython, the dummy code below runs in about 0.7 seconds.

Is anything better possible within Python? I've briefly looked at sorted lists (https://pypi.org/project/sortedcontainers/) but saw no performance improvement. Will I see a noticeable improvement by switching to pure C? In my full code I only need to use the heappush and heappop operations.

from _heapq import *
cdef int i
cdef list openHeap = []

for i in range(320000*8):
    heappush(openHeap, (i, 22))

EDIT: To clarify, in the full code the values being pushed to the heap are not in any sorted or predefined order (hence using the heap to efficiently find the minimum value).

DivideByZero
  • 131
  • 2
  • 11
  • How long does it take if instead of `(i, 22)` you only push `i`? – superb rain Aug 04 '20 at 09:54
  • @superbrain It takes about half the time, but I need to hold on to both (I'm using this in an A* path finding algorithm, in the full code "i" is the path score that I'm sorting by and "22" is the node location) – DivideByZero Aug 05 '20 at 13:56
  • Ok. I was thinking maybe the tuple-building is very expensive, so that pushing something like `i * 100 + 22` could help. But if even pushing `i` is only factor 2 faster, then meh. Btw, is it intentional that you push the values in sorted order, so that there's no sifting at all? – superb rain Aug 05 '20 at 14:01
  • You're right, I updated it to just Python. Unfortunately, the sorting placement of the value being pushed is not known, in the full code I'm pushing (node.f, node.index) where node.f is a certain cost value and node.index lets me keep track of which node is at the top of the heap when I pop the heap. I've written it as above in case someone wants to replicate the code as a minimal working example. – DivideByZero Aug 05 '20 at 16:51
  • Try using [heapify](https://docs.python.org/3/library/heapq.html#heapq.heapify). Since your data are already generated, no need to call `heappush` for each item. – Tyler Gannon Jul 04 '22 at 04:53

0 Answers0