I am trying to handle a hard to duplicate error case using TCPServer. We have seen a few occurrences that when a socket timeout happens in the handler, the code never recovers and keeps returning the socket.timeout exception. This looks to be from the following snippet in the Socket.py library code:
def readinto(self, b):
"""Read up to len(b) bytes into the writable buffer *b* and return
the number of bytes read. If the socket is non-blocking and no bytes
are available, None is returned.
If *b* is non-empty, a 0 return value indicates that the connection
was shutdown at the other end.
"""
self._checkClosed()
self._checkReadable()
if self._timeout_occurred:
raise OSError("cannot read from timed out object")
while True:
try:
return self._sock.recv_into(b)
except timeout:
self._timeout_occurred = True
raise
except InterruptedError:
continue
except error as e:
if e.args[0] in _blocking_errnos:
return None
raise
once a timeout has occurred _timeout_occurred is set to True, and the next pass into this function, the socket has the flag set, and will immediately exit with the cannot read from timed out object error.
Now the code uses TCP Server (only relevant code included) Basically it is reading stuff from the socket, and queing it to be handled separately.
def get_event(file_):
pre_package_len = 8
msg = file_.read(pre_package_len)
if len(msg) == pre_package_len:
pkg = PRE_PACKAGE_FRAME.unpack(msg)
msg = file_.read(pkg['len'])
logger.debug('recv: type: %s len: %s bytes read: %s',
pkg['type'], pkg['len'], len(msg))
if len(msg) >= pkg['len']:
if pkg['type'] == cdefs.kNotification:
e = EVENT_FRAME.unpack(msg)
return decode_event(e)
logger.warn('received unsupported package type: %s', pkg['type'])
else:
logger.error('failed to recv')
class _EventHandler(StreamRequestHandler):
def handle(self):
logger.debug("Got event from %s", self.client_address)
try:
e = get_event(self.rfile)
if e:
self.q.put(e)
except socket.timeout:
logger.error('timed out reading event')
def process_event(q, handler, shutdown_sentinel):
for e in iter(q.get, shutdown_sentinel):
try:
handler(e)
except Exception:
logger.exception('Unhandled exception handling event: %s', e)
logger.info('exiting')
def eventhandler_maker(q, timeout):
return type('NewEventHandler',
(_EventHandler, object),
dict(q=q, timeout=timeout))
def process_events(handler, address, timeout=20):
sentinel = object()
q = Queue()
eventhandler = eventhandler_maker(q, timeout)
server = TCPServer(address, eventhandler)
start_thread(server.serve_forever)
start_thread(process_event, (q, handler, sentinel))
def shutdown():
logger.info('shutting down')
q.put(sentinel)
server.shutdown()
def add_event(e):
q.put(e)
return shutdown, add_event
The symptoms are that once the timeout happens, the log keeps showing 'timed out reading event' and the code never does anything anymore. I added code to dump out the server.socket.gettimeout() and socket.getdefaulttimeout() and both return None. This application is running on an embedded Linux 3.10 kernel with python 3.4.0.
I have 2 questions here:
- What is a good recovery strategy here? Shutdown() / Close() the server socket and then restart it? Or are there better strategies?
- Is there a good third party tool to provoke a timeout so a recovery strategy can be proven to be correct?