1

I was wondering if anyone has tried using [gevent][1] and [socksipy][2] for concurrent downloads.

JohnJ
  • 6,736
  • 13
  • 49
  • 82

1 Answers1

2

I've used gevent for downloading ~12k pictures from yfrog, instagram, twitpic, etc. The cumulated size of the pictures was around 1.5Gb, and it took ~20 minutes to download them all, on my home wifi.

To do so, I implemented an image_download function which sole purpose was to download a picture from a given URL, and then asynchronously mapped an URLs list on the image_download function, using a gevent.Pool.

from gevent import monkey
monkey.patch_socket()  # See http://www.gevent.org/gevent.monkey.html
import gevent

NB_WORKERS = 50

def image_download(url):
    # retrieve image

def parallel_image_download(urls):  # urls is of type list
    """ Activate NB_WORKERS Greenlets to asynchronously download the images. """
    pool = gevent.Pool(NB_WORKERS)
    return pool.map(image_download, urls)

NB: I settled on 50 parallel workers after a couple of tries. Passed 50, the total runtime did not increase.

Balthazar Rouberol
  • 6,822
  • 2
  • 35
  • 41
  • That is an interesting example. Thanks for sharing that. Can I ask you why you use pool.map rather than gevent.spawn? Is there a difference between them? – JohnJ Nov 22 '12 at 20:01
  • Have a look at http://sdiehl.github.com/gevent-tutorial/. My feeling os that ``Pool.map()`` handles the results gathering for you, and I needed to get a list of all results. Might be that you can do the same using ``spawn()``. I just know that it worked well with ``map()``. – Balthazar Rouberol Nov 22 '12 at 20:06
  • Yes I have seen that tutorial. Well to collect results I was using gevent.joinall to do the results collection. Thanks for your insights. – JohnJ Nov 22 '12 at 20:09