2

I am trying to create a queue that contains url strings for a simple web crawler. I don't want duplicate items to be added to a queue, so I have created a helper function to check the url to be added to the queue and see if it is already in it. I was trying to use this code (which I modified from a similar stackoverflow question), pages is the queue:

def is_in_url_list(self, url):
    return url in self.pages.queue

However, I cannot seem to get this to work. Even when I pass something that should return true, it returns false. Is there a better way to go about this? Thanks!

1 Answers1

3

For your purposes, there's no point in using queue, instead you can just use the normal list with list.popleft() or you can use collections.deque. In case you're curious, the queue is used for multiprocessing, since it provides a thread-safe data-structure that different threads can communicate using (if this sounds like gibberish to you, just ignore it)

Since the problem you describe is a common one, the links I've provided actually show how one can use lists as both stacks and queues.

Now, deque is a double ended queue, hence the d. This can pop from both ends, becoming a stack when need be, and a queue when necessary.

Now you might be wondering, well if deque and list can achieve the same results, why use one over the other, and this too is explained within the documentation:

Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

Now the choice is yours.

Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
  • This was very helpful, thank you! I am fairly new to python and I didn't realize that a deque was the correct choice. It works perfectly now. – Robert Alexander Feb 19 '14 at 20:06