1

For the example from the official heapq:

>>> heap = []
>>> data = [(1, 'J'), (4, 'N'), (3, 'H'), (2, 'O')]
>>> for item in data:
...     heappush(heap, item)
...
>>> while heap:
...     print(heappop(heap)[1])
J
O
H
N

I want to further implement an efficient selective_push such that

  1. selective_push((1, 'M')) is equivalent to heappush since 'M' is not in the heap
  2. selective_push((3.5, 'N')) is equivalent to heap[2]= (3.5, 'N'); heapify(heap) since 3.5<4
  3. selective_push((4.5, 'N')) does nothing since 4.5>4

The following implementation explains the goal but slow:

def selective_push(heap,s):
   NotFound=True
   for i in range(len(heap)): #linear search
        if heap[i][1]==s[1]:
            if s[0]<heap[i][0]:
                 heap[i]=s      #replacement
                 heapify(heap)
            NotFound=False
            break
    if NotFound:
       heappush(heap,s)

I think it is slow due to the linear search, which ruins the log(n) complexity of heapq.push. The replacement rate is low, but the linear search is always executed.

sshashank124
  • 31,495
  • 9
  • 67
  • 76
user3015347
  • 503
  • 3
  • 12

1 Answers1

1

The heapq docs have an example of how to change the priority of existing items. (The example also uses a count to ensure that items with the same priority are returned in the same order that they were added: since you haven't mentioned that as a requirement, I've simplified the code by removing that part.) I've also added the logic you mention relating to when existing items are replaced.

Essentially it boils down to maintaining a dictionary (entry_finder) for quick look-up of items, and marking items as deleted without removing them from the heap straight away, and skipping over the marked items when popping from the heap.

pq = []                         # list of entries arranged in a heap
entry_finder = {}               # mapping of tasks to entries
REMOVED = '<removed-task>'      # placeholder for a removed task

def add_task(task, priority=0):
    'Add a new task or update the priority of an existing task'
    if task in entry_finder:
        old_priority, _ = entry_finder[task]
        if priority < old_priority:
            # new priority is lower, so replace
            remove_task(task)
        else:
            # new priority is same or higher, so ignore
            return
    entry = [priority, task]
    entry_finder[task] = entry
    heappush(pq, entry)

def remove_task(task):
    'Mark an existing task as REMOVED.  Raise KeyError if not found.'
    entry = entry_finder.pop(task)
    entry[-1] = REMOVED

def pop_task():
    'Remove and return the lowest priority task. Raise KeyError if empty.'
    while pq:
        priority, task = heappop(pq)
        if task is not REMOVED:
            del entry_finder[task]
            return task
    raise KeyError('pop from an empty priority queue')

Some notes:

  • heappush is efficient because it can assume that the list being pushed to is already ordered as a heap; heapify has to check all the elements every time it is called

  • not really removing items, just marking them as removed, is quick but does mean that if you are resetting lots of priorities then some storage is effectively wasted; whether this is appropriate will depend on your use case

  • you'll need to create similar wrappers for any other heapq functions you want to use, since you always need to make sure that the entry_finder look-up dictionary is kept in sync with the data in the heapq

Matthew Strawbridge
  • 19,940
  • 10
  • 72
  • 93