3

I am using pycurl to connect to the twitter streaming API.

This works well but sometimes after running for a few hours it will stop hang indefinitely, not throwing any exceptions. How can I detect/handle a hang in this script?

import pycurl, json

STREAM_URL = "http://stream.twitter.com/1/statuses/filter.json"

USER = "presidentskroob"
PASS = "12345"

def on_receive(data):
  print data

conn = pycurl.Curl()
conn.setopt(pycurl.USERPWD, "%s:%s" % (USER, PASS))
conn.setopt(pycurl.URL, STREAM_URL)
conn.setopt(pycurl.WRITEFUNCTION, on_receive)
conn.perform()
ʞɔıu
  • 47,148
  • 35
  • 106
  • 149

4 Answers4

4

FROM: http://man-wiki.net/index.php/3:curl_easy_setopt

CURLOPT_LOW_SPEED_LIMIT - Pass a long as parameter. It contains the transfer speed in bytes per second that the transfer should be below during CURLOPT_LOW_SPEED_TIME seconds for the library to consider it too slow and abort.

and

CURLOPT_LOW_SPEED_TIME - Pass a long as parameter. It contains the time in seconds that the transfer should be below the CURLOPT_LOW_SPEED_LIMIT for the library to consider it too slow and abort.


Example:

conn.setopt(pycurl.LOW_SPEED_LIMIT, 1)
conn.setopt(pycurl.LOW_SPEED_TIME, 90)
slayton
  • 20,123
  • 10
  • 60
  • 89
the Internet
  • 98
  • 1
  • 1
  • 7
1

The curl switch --speed-limit allows you to have curl return an error if the transfer speed dips below a given threshold for a given length of time. Unfortunately, the speed threshold cannot be set to values less than one, and the ideal value for the Twitter Streaming API would be 1/30 since it sends a single character every 30 seconds for its keep alive. The best you can do is used a threshold of 1 Bps, but then curl will give up whenever there is a period of inactivity (no tweets) longer than the duration you select. The command below will give up if there is a 30 second period during which it receives less than 30 bytes.

curl -d @filter.txt https://stream.twitter.com/1/statuses/filter.json -uTwitterLogin:TwitterPassword --speed-time 30 --speed-limit 1

To summarize: no satisfactory solution using just the options in of curl.

Clark
  • 890
  • 8
  • 20
  • you could use `--libcurl` option to generate C code that corresponds to given command-lien options. It should be simple to port it to Python with pycurl. – jfs Dec 03 '11 at 07:31
0

You can use the timeout settings:

 conn.setopt(pycurl.CONNECTTIMEOUT, 15) 
 conn.setopt(pycurl.TIMEOUT, 25) 

You'll get a pycurl.error exception if curl times out.

SteveMc
  • 1,386
  • 8
  • 11
  • I'm afraid you don't understand the Twitter streaming API. The request is being made and stays open for hours. – gnur Feb 11 '11 at 15:22
0

I have a premonition that this could be related to "tcp broken pipe" scenario. I.e. the other peer at some moment closes the connection, but our peer somehow ignores the event. You will need to use some kind of keep-alives to deel with this.

The "right", elegant solution of the problem may require some actions from twitter itself. This is rather common issue; my friend have used the streaming api and encountered the same problem.

ulidtko
  • 14,740
  • 10
  • 56
  • 88
  • Twitter is supposed to send blank lines as a keep-alive. So maybe you need to have another thread that keeps a countdown since you last got a packet from twitter and interrupts the main thread if nothing has been received in X amount of time – ʞɔıu Feb 11 '11 at 16:35
  • @ʞɔıu, something like that. But please don't abuse threads: look for timeout options in the curl api. There has to be something. – ulidtko Feb 11 '11 at 16:38