1

When I invoke the following function to process a long list of URLs (accessing the same site (i.e. http://foo.bar.com/url1, http://foo.bar.com/url2 etc):

import time
import grequests

def processUrls(block=2500, write=100000, timeout=0.5):
    urls = ...  ## generate long array of URLs
    chunks = [urls[i:i+block] for i in xrange(0, len(urls), block)] ## chunk 'em

    def callback(response, *args, **kwargs):
        txt = response.text
        ## do something with txt
        response.close()

    for i, chunk in enumerate(chunks):
        rs = [grequests.get(url, callback=callback) for url in chunk]
        grequests.map(rs, stream=False, size=block / 10)
        time.sleep(timeout)
        ## do stuff

I get a bunch of errors like this:

File "/.../python2.7/site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/.../python2.7/site-packages/grequests.py", line 71, in send
self.url, **merged_kwargs)
File "/.../python2.7/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/.../python2.7/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/.../python2.7/site-packages/requests/adapters.py", line 415, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(97, 'Address family not supported by protocol'))
<Greenlet at 0x7f8ce2c0ec30: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x7f8ce31e2890>>(stream=False)> failed with ConnectionError

The number of messages is a lot smaller than the number of URLs.

What could be causing these errors? I am running this on RedHat 6.6

UPDATE: I collected all URLs that gave me the error from the full dataset I've been working with. They all appeared to be OK (well-formed etc) and when I pasted one of them into a browser, I got meaningful results and no error messages. Then, I reran the test with just a subset of data. Again, got some errors and collected the bad URL list for the subset. Turns out, none of the bad URLs from the subset are in the list of bad URLs for the full set. Which suggests that the error is not really URL-specific but rather some type of hiccup, either on my side or on the other side. Does this ring any bells?

I Z
  • 5,719
  • 19
  • 53
  • 100

0 Answers0