Multithreaded Downloading Through Proxies In Python

Question

What would be the best library for multithreaded harvesting/downloading with multiple proxy support? I've looked at Tkinter, it looks good but there are so many, does anyone have a specific recommendation? Many thanks!

many files simultaneously downloading and adding new files when threads are free — Cookies, Oct 20 '09 at 21:09

score 1 · Answer 1 · answered Oct 20 '09 at 20:28

1

Twisted

answered Oct 20 '09 at 20:28

anthony

40,424
5
55
128

score 0 · Answer 2 · answered Oct 20 '09 at 20:36

0

Is this something you can't just do by passing a URL to newly spawned threads and calling urllib2.urlopen in each one, or is there a more specific requirement?

answered Oct 20 '09 at 20:36

Kylotan

18,290
7
46
74

urllib2 isn't thread safe from what I've seen, but I could of just been doing it wrong because I'm a noob to threading. I am downloading a lot of files so I'd rather use something a bit more powerful than just urllib anyway – Cookies Oct 20 '09 at 20:40
It's almost certain to be thread-safe unless you do something inherently dangerous like trying to access the same object from multiple threads. – Kylotan Oct 20 '09 at 22:10

score 0 · Answer 3 · answered Oct 20 '09 at 21:24

0

Also take a look at http://scrapy.org/, which is a scraping framework built on top of twisted.

answered Oct 20 '09 at 21:24

twneale

2,836
4
29
34

Excellent, I don't see anything about proxy support but I think I could do that myself. – Cookies Oct 20 '09 at 21:36
1

No. Support for HTTP proxies is not currently implemented in Scrapy, but it will be in the future. For more information about this, follow this ticket. Setting the http_proxy environment variable won’t work because Twisted (the library used by Scrapy to download pages) doesn’t support it. See this Twisted ticket for more info. – Cookies Oct 20 '09 at 21:39

Multithreaded Downloading Through Proxies In Python

3 Answers3