0

I'm getting occasional AttributeErrors with code of the following sort. I set a mechanize instance up with:

self.mech = mechanize.Browser(factory=mechanize.RobustFactory())
self.cj = mechanize.CookieJar()
self.mech.set_cookiejar(self.cj)
self.mech.set_proxies({'http': <snipped>})
self.mech.set_handle_robots(False)

USER_AGENT = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
headers = [h for h in self.mech.addheaders if h[0].lower() != 'user-agent']
headers.append(('User-agent', USER_AGENT))
self.mech.addheaders = headers

And I use it as such:

resp = self.mech.open(the_url)
html = resp.read()
resp.close()

The latter snippet occasionally raises exceptions:

...
    html = resp.read()
AttributeError: 'NoneType' object has no attribute 'read'

In other cases, the traceback is actually:

...
    resp.close()
  File "C:\Python26\lib\site-packages\mechanize\_response.py", line 88, in close
    self.wrapped.close()
  File "C:\Python26\lib\site-packages\mechanize\_response.py", line 368, in close
    wrapped.close()
  File "C:\Python26\lib\socket.py", line 273, in close
    self._sock.close()
AttributeError: 'NoneType' object has no attribute 'close'

That is, the .read() does not fail, but the .close() does. More tracebacks:

...
    html = resp.read()
  File "C:\Python26\lib\site-packages\mechanize\_response.py", line 190, in read
    self.__cache.write(self.wrapped.read())
  File "C:\Python26\lib\socket.py", line 348, in read
    data = self._sock.recv(rbufsize)
  File "C:\Python26\lib\httplib.py", line 542, in read
    s = self.fp.read(amt)
  File "C:\Python26\lib\socket.py", line 377, in read
    data = self._sock.recv(left)
error: [Errno 10035] A non-blocking socket operation could not be completed immediately

And:

    resp = self.mech.open(the_url)
  File "C:\Python26\lib\site-packages\mechanize\_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "C:\Python26\lib\site-packages\mechanize\_mechanize.py", line 249, in _mech_open
    self._set_response(response, False)
  File "C:\Python26\lib\site-packages\mechanize\_mechanize.py", line 308, in     _set_response
    self._factory.set_response(response)
  File "C:\Python26\lib\site-packages\mechanize\_html.py", line 623, in set_response
    data = response.read()
  File "C:\Python26\lib\site-packages\mechanize\_response.py", line 190, in read
    self.__cache.write(self.wrapped.read())
  File "C:\Python26\lib\socket.py", line 348, in read
    data = self._sock.recv(rbufsize)
  File "C:\Python26\lib\httplib.py", line 542, in read
    s = self.fp.read(amt)
  File "C:\Python26\lib\socket.py", line 377, in read
    data = self._sock.recv(left)
AttributeError: 'NoneType' object has no attribute 'recv'

Why might this happen? The mechanize documentation isn't very good, and a cursor poke-through of the source reveals that it is relatively convoluted.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • Can you show us more of your code? It's hard to tell which of the three different `mechanize` interfaces you're using without at least knowing what exactly `self.mech` is, and it's hard enough debugging `mechanize` when you know that much… – abarnert Nov 26 '12 at 23:48
  • Have you turned on full logging, and/or tried calling `get_data` on the `resp` to see if there's anything odd in the cases that trigger each of these errors? – abarnert Nov 27 '12 at 00:00
  • Hmm this is a rare error and I have large volume so I'm not sure I can log everything to wait until it happens. Any suggestions about that? I'll add a `.get_data()` call to the error when there is a response object and report back what the output is once it happens again – Claudiu Nov 27 '12 at 00:06
  • added a few more tracebacks. it seems to be the remote site screwing something up... – Claudiu Nov 27 '12 at 00:17
  • Meanwhile, it might be worth hacking up the source to do some more debugging, to find out whether you've already got None after hanize/blob/master/mechanize/_mechanize.py line 230, or if it's one of the later lines that updates response is losing it. (There doesn't seem to be any way to avoid the `return response` without raising.) If it's the former, the next step is into _opener.py after line 193 and each time through 204. I'm looking at the original exception here; the three others would probably require similar debugging… – abarnert Nov 27 '12 at 01:07
  • One more question: Can you upgrade to Python 2.7 and the latest release of mechanize, just to make sure you're not hitting a bug that's already been fixed? – abarnert Nov 27 '12 at 01:09
  • 2
    Finally: The remote site screwing up shouldn't be able to make mechanize's socket disappear, trick it into calling recv on a socket that isn't ready, or replace a response object with None before returning it. Of course the remote site could be screwing up (or even working properly) in a very uncommon way that triggers bugs in mechanize that nobody has ever tested, but you still can't really blame the remote site for that. PS, is that remote site public? If not, can you log the traffic somewhere, or is that all confidential? – abarnert Nov 27 '12 at 01:11

0 Answers0