1

I have implemented priority queue from heapq data structure in python. Now I want to delete a particular element (by value) from the heap maintaining the heap invariant. I know it can be done by removing the element and heapify() again but that is O(n), and might be very slow since I have a very large heap.

The other thing that I am trying is, if I had known index I could have replaced it with last element and done _shiftup(). But since I don't know the index, I'll have to search, which again is linear time.

Can I keep a parallel dict to point to location and use it? How can I update such dict with every insert to queue?

EDIT:

Actually I need above to implement decreaseKey() in O(log n) time. If there's a better method to directly do that, that's also equivalently good.

Naman
  • 2,569
  • 4
  • 27
  • 44

1 Answers1

3

You may have read this already, but you could use the approach the the heapq docs suggest, which is to just mark the element as removed, without actually removing it from the heap:

The remaining challenges revolve around finding a pending task and making changes to its priority or removing it entirely. Finding a task can be done with a dictionary pointing to an entry in the queue.

Removing the entry or changing its priority is more difficult because it would break the heap structure invariants. So, a possible solution is to mark the existing entry as removed and add a new entry with the revised priority:

pq = []                         # list of entries arranged in a heap
entry_finder = {}               # mapping of tasks to entries
REMOVED = '<removed-task>'      # placeholder for a removed task
counter = itertools.count()     # unique sequence count

def add_task(task, priority=0):
    'Add a new task or update the priority of an existing task'
    if task in entry_finder:
        remove_task(task)
    count = next(counter)
    entry = [priority, count, task]
    entry_finder[task] = entry
    heappush(pq, entry)

def remove_task(task):
    'Mark an existing task as REMOVED.  Raise KeyError if not found.'
    entry = entry_finder.pop(task)
    entry[-1] = REMOVED

def pop_task():
    'Remove and return the lowest priority task. Raise KeyError if empty.'
    while pq:
        priority, count, task = heappop(pq)
        if task is not REMOVED:
            del entry_finder[task]
            return task
    raise KeyError('pop from an empty priority queue')

This way the removal is just a O(1) lookup in a dict. And the removed item will just be ignored when it's popped from the queue later (at a cost of an extra O(log n) on the pop_task operation). The drawback of course, is that if a client is not actually popping items from the queue, the size of the heap will grow, even though it items are being "removed" according to the API.

dano
  • 91,354
  • 19
  • 222
  • 219
  • 2
    Removal might be `O(1)` when you call it, but it requires `O(log n)` time in total. – Veedrac Sep 13 '14 at 23:55
  • @dano Yes actually I had seen this approach, but I thought whether there is a better solution also. Also with this approach isEmpty() function will not remain O(1), right? Is there any workaround for that? – Naman Sep 13 '14 at 23:59
  • @Veedrac Yes, thanks. I've made that more clear in my answer. – dano Sep 13 '14 at 23:59
  • @Naman I wouldn't worry about `isEmpty` being more expensive because the only costs you pay are costs you would have payed if you didn't do the removal lazily. – Veedrac Sep 14 '14 at 00:01
  • 2
    @Naman The `entry_finder` dict should reflect the true length of the queue, since items are popped from it when `remove_task` is called. ([`len(dict_object)` is `O(1)`](http://stackoverflow.com/questions/1115313/cost-of-len-function)) – dano Sep 14 '14 at 00:01
  • @dano Actually I have one more doubt, I want to use this in decreaseKey() operation that I am trying to write in O(log n). The problem with soft delete will be, it will bloat up the size of heap as many times I call decreaseKey. Is there any other solution also for this? – Naman Sep 14 '14 at 17:36
  • @Naman I mentioned this drawback in my original answer, too. The "mark as removed" approach sacrifices memory usage to make removal/priority changes `O(log n)`. If memory usage is a concern, and you're frequently changing priority of keys in the queue, without popping items as well, you may want to consider another approach. – dano Sep 14 '14 at 18:37
  • @dano Actually my end goal is pretty simple and standard. I am implementing Prim's algorithm for MST on a graph that have around 2 million of edges. Now The only way I can implement Prim's in Elog(V) is when I can perform E decreaseKey operations in log(V) times. Is there any other solution for this? – Naman Sep 14 '14 at 19:07
  • @Naman You can still do `decreaseKey` in `log(V)` time using this implementation, you just end up using `E` extra memory to do it. There's a description of a Python Prim's algorithm implementation [here](http://interactivepython.org/courselib/static/pythonds/Graphs/PrimsSpanningTreeAlgorithm.html?highlight=decreasekey) that claims to have a `Elog(V)` `decreaseKey` implementation, but when I look at [the code](https://github.com/bnmnetp/pythonds/blob/master/graphs/priorityQueue.py#L73), the implementation looks like it does a `O(N)` lookup of the item to change the priority of. – dano Sep 14 '14 at 19:36
  • @dano It does look like it is O(n), since it is doing linear search. So what would you recommend in this situation? Should I implement my own binary heap in python and when I am inserting I can keep track of each element's location? I thought it to be a pretty standard problem. I am a bit shocked as this cannot be done with python heapq. – Naman Sep 14 '14 at 19:48
  • @Naman I think that's probably your best option. It looks like you could implement this with fairly minimal changes to `heapq` - of course the problem there is you're stuck with a pure Python implementation, rather than the faster, C-implementation provided by the `_heapq` module. The changes are probably pretty similar for `_heapq`, though, if you're comfortable writing C. – dano Sep 14 '14 at 19:55
  • @dano I think I am fairly comfortable with C but not sure whether I can do that here, since it's a part of assignment and I am mostly not supposed to do changes in python's own module. Probably I'll build a all python heap and use it. Hope when I am done, i'll put it somewhere to use for everyone. Thanks for the help. – Naman Sep 14 '14 at 20:04
  • @dano: In `pop_task()` method the `if` condition will be always true because you are changing `REMOVED` in dict not in the heap. I think you are not deleting anything. – Isan Sahoo Mar 28 '20 at 08:32
  • @IsanSahoo All the code in my answer comes [directly from the Python documentation](https://docs.python.org/2/library/heapq.html#priority-queue-implementation-notes). – dano Mar 29 '20 at 03:43