3

I am trying to handle a hard to duplicate error case using TCPServer. We have seen a few occurrences that when a socket timeout happens in the handler, the code never recovers and keeps returning the socket.timeout exception. This looks to be from the following snippet in the Socket.py library code:

def readinto(self, b):
    """Read up to len(b) bytes into the writable buffer *b* and return
    the number of bytes read.  If the socket is non-blocking and no bytes
    are available, None is returned.

    If *b* is non-empty, a 0 return value indicates that the connection
    was shutdown at the other end.
    """
    self._checkClosed()
    self._checkReadable()
    if self._timeout_occurred:
        raise OSError("cannot read from timed out object")
    while True:
        try:
            return self._sock.recv_into(b)
        except timeout:
            self._timeout_occurred = True
            raise
        except InterruptedError:
            continue
        except error as e:
            if e.args[0] in _blocking_errnos:
                return None
            raise

once a timeout has occurred _timeout_occurred is set to True, and the next pass into this function, the socket has the flag set, and will immediately exit with the cannot read from timed out object error.

Now the code uses TCP Server (only relevant code included) Basically it is reading stuff from the socket, and queing it to be handled separately.

def get_event(file_):
    pre_package_len = 8
    msg = file_.read(pre_package_len)
    if len(msg) == pre_package_len:
        pkg = PRE_PACKAGE_FRAME.unpack(msg)
        msg = file_.read(pkg['len'])
        logger.debug('recv: type: %s len: %s bytes read: %s',
                     pkg['type'], pkg['len'], len(msg))
        if len(msg) >= pkg['len']:
            if pkg['type'] == cdefs.kNotification:
                e = EVENT_FRAME.unpack(msg)
                return decode_event(e)
            logger.warn('received unsupported package type: %s', pkg['type'])
    else:
        logger.error('failed to recv')

class _EventHandler(StreamRequestHandler):
    def handle(self):
        logger.debug("Got event from %s", self.client_address)
        try:
            e = get_event(self.rfile)
            if e:
                self.q.put(e)
        except socket.timeout:
            logger.error('timed out reading event')

def process_event(q, handler, shutdown_sentinel):
    for e in iter(q.get, shutdown_sentinel):
        try:
            handler(e)
        except Exception:
            logger.exception('Unhandled exception handling event: %s', e)
    logger.info('exiting')

def eventhandler_maker(q, timeout):
    return type('NewEventHandler',
                (_EventHandler, object),
                dict(q=q, timeout=timeout))

def process_events(handler, address, timeout=20):
    sentinel = object()
    q = Queue()
    eventhandler = eventhandler_maker(q, timeout)
    server = TCPServer(address, eventhandler)
    start_thread(server.serve_forever)
    start_thread(process_event, (q, handler, sentinel))

    def shutdown():
        logger.info('shutting down')
        q.put(sentinel)
        server.shutdown()

    def add_event(e):
        q.put(e)

    return shutdown, add_event

The symptoms are that once the timeout happens, the log keeps showing 'timed out reading event' and the code never does anything anymore. I added code to dump out the server.socket.gettimeout() and socket.getdefaulttimeout() and both return None. This application is running on an embedded Linux 3.10 kernel with python 3.4.0.

I have 2 questions here:

  1. What is a good recovery strategy here? Shutdown() / Close() the server socket and then restart it? Or are there better strategies?
  2. Is there a good third party tool to provoke a timeout so a recovery strategy can be proven to be correct?
MauriceS
  • 51
  • 4

0 Answers0