0

What would be the best library for multithreaded harvesting/downloading with multiple proxy support? I've looked at Tkinter, it looks good but there are so many, does anyone have a specific recommendation? Many thanks!

Cookies
  • 447
  • 2
  • 5
  • 9

3 Answers3

1

Twisted

anthony
  • 40,424
  • 5
  • 55
  • 128
0

Is this something you can't just do by passing a URL to newly spawned threads and calling urllib2.urlopen in each one, or is there a more specific requirement?

Kylotan
  • 18,290
  • 7
  • 46
  • 74
  • urllib2 isn't thread safe from what I've seen, but I could of just been doing it wrong because I'm a noob to threading. I am downloading a lot of files so I'd rather use something a bit more powerful than just urllib anyway – Cookies Oct 20 '09 at 20:40
  • It's almost certain to be thread-safe unless you do something inherently dangerous like trying to access the same object from multiple threads. – Kylotan Oct 20 '09 at 22:10
0

Also take a look at http://scrapy.org/, which is a scraping framework built on top of twisted.

twneale
  • 2,836
  • 4
  • 29
  • 34
  • Excellent, I don't see anything about proxy support but I think I could do that myself. – Cookies Oct 20 '09 at 21:36
  • 1
    No. Support for HTTP proxies is not currently implemented in Scrapy, but it will be in the future. For more information about this, follow this ticket. Setting the http_proxy environment variable won’t work because Twisted (the library used by Scrapy to download pages) doesn’t support it. See this Twisted ticket for more info. – Cookies Oct 20 '09 at 21:39