20

I need to develop an app that lets me track tweets and save them in a mongodb for a research project (as you might gather, I am a noob, so please bear with me). I have found this piece of code that sends tweets streaming through my terminal window:

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key = ""
access_secret = "" 


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['Gandolfini'])

Is there a way I can modify this piece of code so that instead of having tweets streaming over my screen, they are sent to my mongodb database?

Thanks

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
user2161725
  • 667
  • 2
  • 7
  • 12
  • NOTE: original piece of code taken from: http://peter-hoffmann.com/2012/simple-twitter-streaming-api-access-with-python-and-oauth.html – user2161725 Jun 20 '13 at 13:01

2 Answers2

18

Here's an example:

import json
import pymongo
import tweepy

consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        super(tweepy.StreamListener, self).__init__()

        self.db = pymongo.MongoClient().test

    def on_data(self, tweet):
        self.db.tweets.insert(json.loads(tweet))

    def on_error(self, status_code):
        return True # Don't kill the stream

    def on_timeout(self):
        return True # Don't kill the stream


sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=['Gandolfini'])

This will write tweets to the mongodb test database, tweets collection.

Hope that helps.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • YES, this seems to work. Thank you so much. I guess the next question is: how would I modify the script that you've supplied so that instead of sending tweets to my local mongodb, I would send them to my remote db hosted on MongoLab? Any thoughts? Thanks again! – user2161725 Jun 21 '13 at 10:17
  • 1
    Sure, `pymongo.MongoClient` accepts `host`, `port` parameters. See [docs](http://api.mongodb.org/python/current/tutorial.html#making-a-connection-with-mongoclient). – alecxe Jun 21 '13 at 10:35
  • First of all, the code works like a charm. I've had a lot of fun playing around with it, so thank you. If, instead of tracking a word, I wanted to track a location how would I do that? I've tried replacing the last line of code so that it reads sapi.filter(locations=['-74,40,-73,41']) but get an AssertionError. Any idea how I can fix this? Thanks! – user2161725 Jun 24 '13 at 11:10
  • `locations` arguments should be a list of length 4, e.g. `['-74', '40', '-73', '41']`. Works? – alecxe Jun 24 '13 at 11:35
  • Yes. To be specific, it worked with changing the last line to sapi.filter(locations=[-74, 40, -73, 41]) i.e. no scare quotes around the lat/long pair. Thanks again. – user2161725 Jun 24 '13 at 18:16
  • Re: sending tweets to remote db, the only thing I would have to do to modify the code would be to change "self.db = pymongo.MongoClient().test" to "self.db = pymongo.MongoClient(host:port).test", yes? – user2161725 Aug 08 '13 at 09:52
6

I have developed a simple command line tool that does exactly this.

https://github.com/janezkranjc/twitter-tap

It allows using the streaming API or the search API.

johnny
  • 165
  • 1
  • 7