9

I have a Django management command, launched via supervisord, that uses tweepy to consume the twitter streaming API.

The agent works quite well however I notice in the logs there's an SSLError every 10-15 minutes and supervisord is re-launching the agent.

The tweepy package is latest, version 1.11. The server is ubuntu 12.04 LTS. I've tried installing the cacert into the key chain as mentioned in the link below, but no luck.

Twitter API SSL Root CA Certificate

Any suggestions?

[2012-08-26 19:28:15,656: ERROR] Error establishing the connection
Traceback (most recent call last):.../.../datasinks.py", line 102, in start
    stream.filter(locations=self.locations)
  File "/site/pythonenv/local/lib/python2.7/site-packages/tweepy/streaming.py", line 228, in filter
    self._start(async)
  File "/site/pythonenv/local/lib/python2.7/site-packages/tweepy/streaming.py", line 172, in _start
    self._run()
  File "/site/pythonenv/local/lib/python2.7/site-packages/tweepy/streaming.py", line 117, in _run
    self._read_loop(resp)
  File "/site/pythonenv/local/lib/python2.7/site-packages/tweepy/streaming.py", line 150, in _read_loop
    c = resp.read(1)
  File "/usr/lib/python2.7/httplib.py", line 541, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.7/httplib.py", line 574, in _read_chunked
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/lib/python2.7/ssl.py", line 241, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 160, in read
  return self._sslobj.read(len)
SSLError: The read operation timed out

Following is an outline of the code.

from tweepy import API, OAuthHandler
from tweepy.streaming import StreamListener, Stream
# snip other imports

class TwitterSink(StreamListener, TweetSink):

  def __init__(self):
    self.auth = OAuthHandler(settings.TWITTER_OAUTH_CONSUMER_KEY, settings.TWITTER_OAUTH_CONSUMER_SECRET)
    self.auth.set_access_token(settings.TWITTER_OAUTH_ACCESS_TOKEN_KEY, settings.TWITTER_OAUTH_ACCESS_TOKEN_SECRET)
    self.locations = '' # Snip for brevity

  def start(self):
    try:
        stream = Stream(self.auth, self,timeout=60, secure=True)
        stream.filter(locations=self.locations)
    except SSLError as e:
        logger.exception("Error establishing the connection")
    except IncompleteRead as r:
        logger.exception("Error with HTTP connection")

  # snip on_data()
  # snip on_timeout()
  # snip on_error()
drevicko
  • 14,382
  • 15
  • 75
  • 97
Dwight Gunning
  • 2,485
  • 25
  • 39
  • 2
    What happens if you set `timeout` to something much larger? I suspect your `Stream` is timing out because it occasionally goes more than 60 seconds without receiving an update. – Travis Mehlinger Oct 29 '12 at 03:07
  • You should consider opening an issue on [GitHub](https://github.com/tweepy/tweepy) if you haven't already. – Michael Mior Nov 03 '12 at 20:46

3 Answers3

7

The certificate doesn't seem to be the problem. The error is just a timeout. Seems like an issue with tweepy's SSL handling to me. The code is equipped to handle socket.timeout and reopen the connection, but not a timeout arriving through SSLError.

Looking at the ssl module code (or docs), though, I don't see a pretty way to catch that. The SSLError object is raised without any arguments, just a string description. For lack of a better solution, I'd suggest adding the following right before line 118 of tweepy/streaming.py:

except SSLError, e:
  if 'timeout' not in exception.message.lower(): # support all timeouts
    exception = e
    break
  if self.listener.on_timeout() == False:
    break
  if self.running is False:
    break
  conn.close()
  sleep(self.snooze_time)

Why it's timing out in the first place is a good question. I have nothing better than repeating Travis Mehlinger's suggestion of setting a higher timeout.

Acorn
  • 49,061
  • 27
  • 133
  • 172
kichik
  • 33,220
  • 7
  • 94
  • 114
  • Good thinking and good job browsing the code. I've come to sort of the same solution and will post my code too. – Dmitry Nov 03 '12 at 20:37
2

Here is how I have it (modified solution from here https://groups.google.com/forum/?fromgroups=#!topic/tweepy/80Ayu1joGJ4):

l = MyListener()
auth = OAuthHandler(settings.CONSUMER_KEY, settings.CONSUMER_SECRET)
auth.set_access_token(settings.ACCESS_TOKEN, settings.ACCESS_TOKEN_SECRET)
# connect to stream
stream = Stream(auth, l, timeout=30.0)
while True:
    # Call tweepy's userstream method with async=False to prevent
    # creation of another thread.
    try:
        stream.filter(follow=reporters, async=False)
         # Normal exit: end the thread
         break
    except Exception, e:
         # Abnormal exit: Reconnect
         logger.error(e)
         nsecs = random.randint(60, 63)
         logger.error('{0}: reconnect in {1} seconds.'.format(
             datetime.datetime.utcnow(), nsecs))
         time.sleep(nsecs)
Dmitry
  • 2,068
  • 2
  • 21
  • 30
  • why `nsecs = random.randint(60, 63)`? – pomber Nov 07 '12 at 04:19
  • @pomber dunno, it was in the original thread, I left it there just because it didn't hurt. Although it'd probably be useful to ask the author. – Dmitry Nov 07 '12 at 21:51
  • Nice solution. Catching `Exception`, though, might catch too much. `ImportError`, `KeyError`, `NameError`, `MemoryError`, `SyntaxError` and many others also inherit from `Exception`. – kichik Jan 24 '13 at 08:09
  • Thanks @kichik. And good point. I know, but I felt lazy there and there was really no other exceptions I was looking out for. And the docs recommend to at least put Exception there to avoid catching everything with just except:. – Dmitry Jan 24 '13 at 16:33
2

There is another alternative solution provided on Github:

https://github.com/tweepy/tweepy/pull/132

user971956
  • 3,088
  • 7
  • 30
  • 47