3

In case there is multi-threads and one function which adds a value to a list and another function which takes that value. What would the difference be with:

import queue
scrape = queue.Queue()
def scrape():
    scrape.put('example')
def send():
    example = scrape.get()
    print (example)

scrape = set([])
def scrape():
    scrape.add('example')
def send():
    example = scrape.pop()
    print (example)

Why do people use the queue module which is 170-180 lines with if conditions slowing the process for this situation if they can use sets which also gives them the advantage of duplicates filtering.

Craig Burgler
  • 1,749
  • 10
  • 19
charles M
  • 75
  • 1
  • 7

1 Answers1

5

Queues maintain ordering of possibly non-unique elements. Sets, on the other hand, do not maintain ordering and may not contain duplicates.

In your case you may need to keep a record of each thing scraped and/or the relative order in which it was scraped. In that case, use queues. If you just want a list of the unique things you scraped, and you don't care about the relative order in which you scraped them, use sets.

As @mata points out, a queue should be used if multiple threads are producing and consuming to/from it. Queues implement the blocking functionality needed to work with producer/consumer threads. Queues are thread-safe, sets are not.

In this example from the docs:

def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in source():
    q.put(item)

q.join()   # block until all tasks are done

get in the consumer thread (i.e. worker) blocks until there is something in the queue to get, join in the producer thread blocks until each item that it put into the queue is consumed, and task_done in the consumer thread tells the queue that the item it got has been consumed.

Craig Burgler
  • 1,749
  • 10
  • 19
  • 2
    That's only one reason, more compelling are the synchroning features of a queue, blocking access, possibility to limit size, etc... – mata Aug 31 '16 at 11:46
  • What do you mean by blocking access? Checking if the queue is empty or not? I simply use try and except with continue inside while true loop. – charles M Aug 31 '16 at 12:29
  • `get`, by default, blocks until something has been put in the queue and is available to be gotten. Much more Pythonic than raising an exception to indicate that the `queue` is empty. – Craig Burgler Aug 31 '16 at 12:35
  • If you look inside get definition there is a try statement inside of it. And also a lot of if and else statements. – charles M Aug 31 '16 at 12:36
  • By "definition" of `get` are you referring to its source code? If so, let Python worry about the `trys` and the conditions etc. and just use the readable, tested, robust, Pythonic `get` method – Craig Burgler Aug 31 '16 at 12:39
  • I was debugging it with my program and it was having a lot of duplicates in an unusual amount since yesterday( Using 20-30 threads ), check http://stackoverflow.com/questions/39223923/need-consumer-and-producer-with-duplicate-filter-python I have now switched to not using queue and I can not say how much the performance of my program has improved and its usage. – charles M Aug 31 '16 at 12:41
  • OK I read your other thread and I see what you are trying to do. Check out the accepted answer here: http://stackoverflow.com/questions/1581895/how-check-if-a-task-is-already-in-python-queue?noredirect=1&lq=1 which subclasses `Queue` to ensure that an item is not added to the `queue` more than once. – Craig Burgler Aug 31 '16 at 12:53
  • 1
    Well, thanks you and Stackoverflow for pointing me in the right direction every time. – charles M Aug 31 '16 at 13:21