1

I came across this weird issue with multiprocessing's Queue.empty() in Python. The following code output is True and 20, right after filling it with elements.

from multiprocessing import Queue
import random

q = Queue()
for _ in range(20):
    q.put(random.randint(0, 2))
#time.sleep(0.01)
print(q.empty())
print(q.qsize())

If I uncomment the sleep, the output is correct: False, 20. How is this possible? This code should run sequentially, which means by the time the q.empty() evaluates, the queue is already filled.

balu
  • 111
  • 1
  • 8

2 Answers2

2

You can't rely on the result from a call to multiprocessing.Queue.empty().

The documentation for .empty() states:

Return True if the queue is empty, False otherwise. Because of multithreading/multiprocessing semantics, this is not reliable.

The documentation also states that a separate thread handles queuing objects, causing the observed behavior:

When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences which are a little surprising, but should not cause any practical difficulties – if they really bother you then you can instead use a queue created with a manager.

After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising queue.Empty.

You have a single process, so use the queue from the Queue module, which does not rely on another thread to add the data to the queue:

from queue import Queue
import random

q = Queue()
for _ in range(20):
    q.put(random.randint(0, 2))
print(q.empty())
print(q.qsize())

If you must use multiple processes, you should try to restructure your code to rely on .empty() as little as possible, because its results are unreliable. For example, instead of using .empty() to check whether there are elements on the queue, you should simply attempt to pop off the queue and block if there aren't any elements.

BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
  • I have to use multiprocessing, so I'll go for the .get() + exception handling for queue.Empty. Thanks. – balu Apr 03 '22 at 08:31
  • Note that you don't need exception handling here: `q.get()` _blocks_ until something appears on the queue. – Tim Peters Apr 03 '22 at 16:47
1

The output isn't deterministic, with or without the sleep(). The part you see runs sequentially, but, under the covers, q.put(thing) hands thing off to a multiprocessing worker thread to do the actual work of mutating the queue. .put() returns at once then, regardless of whether the worker thread has managed to put thing on the queue yet.

This can burn you "for real"! For example, consider this program:

import multiprocessing as mp
import time

q = mp.Queue()
nums = list(range(20))
q.put(nums)
# time.sleep(2)
del nums[-15:]
print(q.get())

Chances are that it will display:

[0, 1, 2, 3, 4]

This is so even if some other process retrieves from q. q.put(nums) hands off the task of pickling nums, and putting its serialized form on the queue, and there's a race between that and the main program mutating nums.

If you uncomment the sleep(2), then chances are high that it will display the original 20-element nums instead.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132