What would be the best library for multithreaded harvesting/downloading with multiple proxy support? I've looked at Tkinter, it looks good but there are so many, does anyone have a specific recommendation? Many thanks!
Asked
Active
Viewed 1,037 times
3 Answers
0
Is this something you can't just do by passing a URL to newly spawned threads and calling urllib2.urlopen in each one, or is there a more specific requirement?

Kylotan
- 18,290
- 7
- 46
- 74
-
urllib2 isn't thread safe from what I've seen, but I could of just been doing it wrong because I'm a noob to threading. I am downloading a lot of files so I'd rather use something a bit more powerful than just urllib anyway – Cookies Oct 20 '09 at 20:40
-
It's almost certain to be thread-safe unless you do something inherently dangerous like trying to access the same object from multiple threads. – Kylotan Oct 20 '09 at 22:10
0
Also take a look at http://scrapy.org/, which is a scraping framework built on top of twisted.

twneale
- 2,836
- 4
- 29
- 34
-
Excellent, I don't see anything about proxy support but I think I could do that myself. – Cookies Oct 20 '09 at 21:36
-
1No. Support for HTTP proxies is not currently implemented in Scrapy, but it will be in the future. For more information about this, follow this ticket. Setting the http_proxy environment variable won’t work because Twisted (the library used by Scrapy to download pages) doesn’t support it. See this Twisted ticket for more info. – Cookies Oct 20 '09 at 21:39