0

This is a toy example of a waitlist (`PriorityQueue') in which each surgery on the waitlist should have a lexicographical order on the pairs (p, date). The p is an integer and the date is a datetime object.

Clearly integers in python have an order, but so do these datetime objects. And I had thought that Is there a Lexicographic PriorityQueue in Python's standard library? had taught me that I just need to implement __lt__ for my Surgery object.

But the following minimal working example shows that the order of the surgeries on the waitlist is wrong.

from queue import PriorityQueue as PQ
import numpy as np
import pandas as pd

np.random.seed(123)

waitlist = PQ()

class Surgery:

    def __init__(self, a, b):

        self.priority = (a,b)

    def __lt__(self, other):

        return self.priority < other.priority

    def __repr__(self):
        return f'Surgery({self.priority})'

# Some fake data
x = np.random.randint(1,3,size=10)
y = pd.date_range('2022-01-01', '2022-01-10')

# Instantiate objects and put them on queue
for i,j in zip(x,y):
    waitlist.put(Surgery(i,j))

for s in waitlist.queue:
    print(s)

Which outputs:

Surgery((1, Timestamp('2022-01-01 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-04 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-03 00:00:00', freq='D')))
Surgery((2, Timestamp('2022-01-02 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-05 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-06 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-07 00:00:00', freq='D')))
Surgery((2, Timestamp('2022-01-08 00:00:00', freq='D')))
Surgery((2, Timestamp('2022-01-09 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-10 00:00:00', freq='D')))

Printing the surgeries in order of the queue shows that the relative position of the p is not satisfied. I don't understand why this is the case. One guess is that the PriorityQueue is not actually satisfying the order, or that the order of elements in waitlist.queue is not the true order represented by the underlying heap.

What is going on with the apparent queue order (and how do I fix it)?

Galen
  • 1,128
  • 1
  • 14
  • 31

1 Answers1

0

PriorityQueue's heap doesn't iterate since iterating through a heap goes through it in a Breadth-First manner, as represented by the numbers in the heapq documentation's theory section and this geeks-for-geeks page on priority queues using binary heaps. Pulling an item out of the heap and shifts the nodes so that the next value in order would be at the top, but the remaining nodes may not necessarily be in order for the BFS conversion. As such, the only way to get the values out correctly is by extracting the values from the PriorityQueue.

from queue import PriorityQueue as PQ
import numpy as np
import pandas as pd

np.random.seed(123)

waitlist = PQ()

class Surgery:

    def __init__(self, a, b):

        self.priority = (a,b)

    def __lt__(self, other):
        return self.priority < other.priority

    def __repr__(self):
        return f'Surgery({self.priority})'

# Some fake data
x = np.random.randint(1,3,size=10)
y = pd.date_range('2022-01-01', '2022-01-10')

# Instantiate objects and put them on queue
for i,j in zip(x,y):
    waitlist.put(Surgery(i,j))

while(waitlist.not_empty):
    print(waitlist.get())

The only thing different about the code above from what you posted is that I changed the retrieval and print in the last 2 lines. The generated output is below.

Surgery((1, Timestamp('2022-01-01 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-03 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-04 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-05 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-06 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-07 00:00:00', freq='D')))
Surgery((1, Timestamp('2022-01-10 00:00:00', freq='D')))
Surgery((2, Timestamp('2022-01-02 00:00:00', freq='D')))
Surgery((2, Timestamp('2022-01-08 00:00:00', freq='D')))
Surgery((2, Timestamp('2022-01-09 00:00:00', freq='D')))
Shorn
  • 718
  • 2
  • 13
  • The [docs also say](https://docs.python.org/3/library/queue.html) `With a priority queue, the entries are kept sorted (using the heapq module) and the lowest valued entry is retrieved first.` Why does using `sorted(list(entries))[0])` imply they are not using a heap? – Galen Feb 10 '23 at 04:55
  • You're right, I missed that part of the doc. It's likely more of iterating through a heap, which is what the original code does. If you look at the [heapq doc about the heap theory](https://docs.python.org/3/library/heapq.html), it's likely that that is how the heap iterates (Breadth-First) while retrieving from the heap won't follow that pattern. If you look at the [graphical representation in this geeks-for-geeks page on priority queues](https://www.geeksforgeeks.org/priority-queue-using-binary-heap/), you can see that they don't look sorted past a few values. I'll edit my answer as well. – Shorn Feb 10 '23 at 05:20