5

I need to poll a web service, in this case twitter's API, and I'm wondering what the conventional wisdom is on this topic. I'm not sure whether this is important, but I've always found feedback useful in the past.

A couple scenarios I've come up with:

  1. The querying process starts every X seconds, eg a cron job runs a python script

  2. A process continually loops and queries at each iteration, eg ... well, here is where I enter unfamiliar territory. Do I just run a python script that doesn't end?

Thanks for your advice.

ps - regarding the particulars of twitter: I know that it sends emails for following and direct messages, but sometimes one might want the flexibility of parsing @replies. In those cases, I believe polling is as good as it gets.

pps - twitter limits bots to 100 requests per 60 minutes. I don't know if this also limits web scraping or rss feed reading. Anyone know how easy or hard it is to be whitelisted?

Thanks again.

  • I wanted to do some scientific data analysis, put a request with a decent description to http://twitter.com/help/request_whitelisting and got whitelisted within a week. So just tell them what you want to do and wait, I won't hurt if you just try. – Peter Hoffmann Jan 10 '09 at 02:55

2 Answers2

5

"Do I just run a python script that doesn't end?"

How is this unfamiliar territory?

import time
polling_interval = 36.0 # (100 requests in 3600 seconds)
running= True
while running:
    start= time.clock()
    poll_twitter()
    anything_else_that_seems_important()
    work_duration = time.clock() - start
    time.sleep( polling_interval - work_duration )

It's just a loop.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 3
    And now you need a cron job to make sure that this script is alive. – jfs Jan 10 '09 at 00:57
  • or a server monitoring application - most servers run 'continually' and no-one has issues with them. that said, they're usually more complex than this little example, so just perform the inner loop as a cron job. – gbjbaanb Jan 10 '09 at 01:14
  • Besides, how do you ensure that cron is always running? :) – gbjbaanb Jan 10 '09 at 01:15
  • If cron stops, I think that means the OS has stopped. You want this to be started (and restarted) by inittab, not cron. – S.Lott Jan 10 '09 at 01:16
  • Last time I wrote one of these, it ran 24x7 for about 5 years without failure or interruption. The "risk of failure" is microscopic. – S.Lott Jan 10 '09 at 01:20
  • word. this just seemed hackish, so i wanted to bounce it off others. thanks. –  Jan 10 '09 at 01:31
  • Let's see, Apache works like this. MySQL works like this. All "server" things are just big loops that run until you stop them. – S.Lott Jan 10 '09 at 12:36
0

You should have a page that is like a Ping or Heartbeat page. The you have another process that "tickles" or hits that page, usually you can do this in your Control Panel of your web host, or use a cron if you have a local access. Then this script can keep statistics of how often it has polled in a database or some data store and then you poll the service as often as you really need to, of course limiting it to whatever the providers limit is. You definitely don't want to (and certainly don't want to rely) on a python scrip that "doesn't end." :)

BobbyShaftoe
  • 28,337
  • 7
  • 52
  • 74
  • Thanks for replying. When you say "page" you mean a web page? I do have a server which is running a website, but I can interact with the backend python and database directly without sending data to a URL. I'm sorry if I misunderstand; I'm trying to see if I'm missing something, eg wrt efficiency. –  Jan 10 '09 at 00:28
  • Oh I see, I think I misunderstood you. A cron job would be ok. The only problem is with having a continually running script is you need some way to make sure it is running, in case it fails unexpectedly. – BobbyShaftoe Jan 10 '09 at 00:37