I was wondering if anyone has tried using [gevent][1] and [socksipy][2] for concurrent downloads.
Asked
Active
Viewed 286 times
1 Answers
2
I've used gevent for downloading ~12k pictures from yfrog, instagram, twitpic, etc. The cumulated size of the pictures was around 1.5Gb, and it took ~20 minutes to download them all, on my home wifi.
To do so, I implemented an image_download
function which sole purpose was to download a picture from a given URL, and then asynchronously mapped an URLs list on the image_download
function, using a gevent.Pool
.
from gevent import monkey
monkey.patch_socket() # See http://www.gevent.org/gevent.monkey.html
import gevent
NB_WORKERS = 50
def image_download(url):
# retrieve image
def parallel_image_download(urls): # urls is of type list
""" Activate NB_WORKERS Greenlets to asynchronously download the images. """
pool = gevent.Pool(NB_WORKERS)
return pool.map(image_download, urls)
NB: I settled on 50 parallel workers after a couple of tries. Passed 50, the total runtime did not increase.

Balthazar Rouberol
- 6,822
- 2
- 35
- 41
-
That is an interesting example. Thanks for sharing that. Can I ask you why you use pool.map rather than gevent.spawn? Is there a difference between them? – JohnJ Nov 22 '12 at 20:01
-
Have a look at http://sdiehl.github.com/gevent-tutorial/. My feeling os that ``Pool.map()`` handles the results gathering for you, and I needed to get a list of all results. Might be that you can do the same using ``spawn()``. I just know that it worked well with ``map()``. – Balthazar Rouberol Nov 22 '12 at 20:06
-
Yes I have seen that tutorial. Well to collect results I was using gevent.joinall to do the results collection. Thanks for your insights. – JohnJ Nov 22 '12 at 20:09