1

I have a python web-scraper script that has been running fine for months. It uses urllib2 to access remote URLs, submit data, capture results, etc.

Suddenly yesterday urlli2 started throwing errors on most (but not all) attempts to access the remote URLs. The error is:

URLError: urlopen error [Errno -2] Name or service not known

What could cause 90% of remote requests to suddenly fail? What does [Errno -2] actually mean?? I have searched the urllib2 docs but found no real explanation of [Errno -2], also have searched here for any answers without success.

Help please?

Additional info:

  1. I can also access the URL from my own browser, so it's not a bad URL.
  2. I can ping the remote domain from my server with no failures at all.
  3. As I mentioned, it does not fail 100% of the time, but about 95%.

The traceback stack looks like this:

File "/var/www/html/pylaw/http.py", line 8, in urlopen

X = urllib2.urlopen(url, data=d).read()

File "/usr/local/lib/python2.7/urllib2.py", line 126, in urlopen

return _opener.open(url, data, timeout)

File "/usr/local/lib/python2.7/urllib2.py", line 400, in open

response = self._open(req, data)

File "/usr/local/lib/python2.7/urllib2.py", line 418, in _open

'_open', req)

File "/usr/local/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)

File "/usr/local/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)

File "/usr/local/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)

URLError: urlopen error [Errno -2] Name or service not known
Jakub M.
  • 32,471
  • 48
  • 110
  • 179
misterrobinson
  • 391
  • 2
  • 4
  • 6
  • http://stackoverflow.com/questions/4673166/python-httplib-name-or-service-not-known – warvariuc Apr 04 '13 at 08:33
  • Add the complete traceback, so we may get a real hint. Presumably you have a misconfiguration of your host-file/name-resolution/proxy-settings , or the service is no longer available or just no connection to the internet. You may consider trying a `socket.gethostbyname(url)` – Don Question Apr 04 '13 at 08:42
  • Maybe your crawler was blocked? – Jakub M. Apr 04 '13 at 09:18
  • It can't be completely blocked, because 5% of the calls go through. The calls that fail, fail immediately. The ones that go through take a few seconds to come back. So it does seem like something is blocking the request. I can ping the remote domain very consistently and fast, it's just the urlopen that fails. – misterrobinson Apr 04 '13 at 09:36

1 Answers1

0

The answer was almost certainly a network configuration problem at the data center hosting our server.

The problem (which is basically blocked outbound HTTP requests) appeared suddenly, acted inconsistently for 60 hours, then just as suddenly cleared itself. In examining our own logs, we also discovered that the same thing had happened about 6 months ago but it only lasted about an hour that time and nobody noticed. This time it lasted 60 hours so EVERYONE noticed.

The host won't admit anything, but everything points to a firewall or router problem in their data center. The host's customer service reps would not even be aware of any such change so of course they can't confirm or deny it. Nothing changed on our server from when everything worked to when it mostly quit working to when it all started working again. In the past they have done things like reboot our server every couple of days.

I think we need to move, eh?

misterrobinson
  • 391
  • 2
  • 4
  • 6