4

i am using gevent to preform concurrent download.
based on this example this is the code:

import gevent
from gevent import monkey

urls = ['https://www.djangoproject.com/','http://www.nytimes.com/','http://www.microsoft.com']
monkey.patch_all()

import urllib2
from datetime import datetime

def print_head(url):
    print ('Starting %s' % url)
    data = urllib2.urlopen(url).read()   
    print ('%s: %s bytes: %r' % (url, len(data), data[:50]))

startTime = datetime.now()
jobs = [gevent.spawn(print_head, url) for url in urls]
gevent.joinall(jobs)
totalTime = datetime.now() - startTime
print "Total time: %s" % totalTime

my problem is that the above code takes much longer than the serial version and in most cases it is timed out. here is the serial version which is much faster:

import urllib2
from datetime import datetime

urls = ['https://www.djangoproject.com/','http://www.nytimes.com/','http://www.microsoft.com']

def print_head(url):
    print ('Starting %s' % url)
    data = urllib2.urlopen(url).read()
    print ('%s: %s bytes: %r' % (url, len(data), data[:50]))

startTime = datetime.now()    
for url in urls:
    try:
        print_head(url)
    except:
        print 'ops EXCEPTION :('

totalTime = datetime.now() - startTime
print "Total time: %s" % totalTime
yossi
  • 12,945
  • 28
  • 84
  • 110
  • There is something with www.microsoft.com, my results for your code are inconsistent. Sometimes `gevent` faster, sometimes `urllib`. And most time consumed by downloading microsoft.com page. Try it with list of another urls. – reclosedev Feb 05 '12 at 12:56
  • I'm getting random timeouts on microsoft.com and nytimes.com ... but only with the gevent version ... Strange ... – Martin Tournoij Feb 05 '12 at 12:57
  • Disabling money patching "solved" the problem, If I use either ``patch_socket()``, ``patch_dns()``, or ``patch_httplib()`` it's unreliable & slow. If I disable all monkey patching it's twice as fast (~1.5s vs ~3s the sequential script takes) ... Don't ask me for an explanation :-/ – Martin Tournoij Feb 05 '12 at 13:06
  • @Carpetsmoker,There are too little urls. Response time of sites dependent on many factors. Try this list http://pastebin.com/3739te6J of urls. And if you'll run OP's tests few times, you'll see that microsoft will get timeouts not only with gevent. – reclosedev Feb 05 '12 at 13:13
  • using the url list that @reclosedev provided i get 'NotImplementedError: inet_ntop() is not available on this platform' – yossi Feb 05 '12 at 13:22
  • @yossi, I wass getting this error long time ago. Try to remove google.com, it's related to IPv6. There is some solution for this issue. – reclosedev Feb 05 '12 at 13:25

1 Answers1

1

ok
the problem was an old gevent package.
i just uninstalled the old one and installed the new one from here as @reclosedev pointed out.
and it is now working fine.

yossi
  • 12,945
  • 28
  • 84
  • 110