13

I'm hoping to track tweets that contain a certain set of words, but not others. For example, if my filter is: "taco" AND ("chicken" OR "beef").

It should return these tweets:

-I am eating a chicken taco.
-I am eating a beef taco.

It should not return these tweets:

-I am eating a taco.
-I am eating a pork taco.

Here is the code I'm currently running:

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json

# authentication data- get this info from twitter after you create your application
ckey = '...'                # consumer key, AKA API key
csecret = '...'             # consumer secret, AKA API secret
atoken = '...'   # access token
asecret = '...'     # access secret

# define listener class
class listener(StreamListener): 

    def on_data(self, data):
        try:
            print data   # write the whole tweet to terminal
            return True
        except BaseException, e:
            print 'failed on data, ', str(e)  # if there is an error, show what it is
            time.sleep(5)  # one error could be that you're rate-limited; this will cause the script to pause for 5 seconds

    def on_error(self, status):
        print status

# authenticate yourself
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["taco"])  # track what you want to search for!

The last line of the code is the part I'm struggling with; if I use:

twitterStream.filter(track=["taco","chicken","beef"])

it will return all tweets containing any of the three words. Other things I've tried, such as:

 twitterStream.filter(track=(["taco"&&("chicken","beef")])

return a syntax error.

I'm fairly new to both Python and Tweepy. Both this and this seem like similar queries, but they are related to tracking multiple terms simultaneously, rather than tracking a subset of tweets containing a term. I haven't been able to find anything in the tweepy documentation.

I know another option would be tracking all tweets containing "taco" then filtering by "chicken" or "beef" into my database, but I'm worried about running up against the 1% streaming rate limit if I do a general search and then filter it down within Python, so I'd prefer only streaming the terms I want in the first place from Twitter.

Thanks in advance-

Sam

Community
  • 1
  • 1
Sam Zipper
  • 642
  • 1
  • 7
  • 19

1 Answers1

15

Twitter does not allow you to be very precise in how keywords are matched. However, the track parameter documentation states that spaces within a keyword are equivelent to logicals ANDS. All of the terms you specify are OR'd together.

So, to achieve your "taco" AND ("chicken" OR "beef") example, you could try the parameters [taco chicken, taco beef]. This would match tweets containing the words taco and chicken, or taco and beef. However, this isn't a perfect solution, as a tweet containing taco, chicken, and beef would also be matched.

Community
  • 1
  • 1
Aaron Hill
  • 3,196
  • 1
  • 18
  • 34
  • Thanks @Aaron- this will work nicely. As an aside, do you know if there is a way to return all words beginning with a sequence of characters? For example, in R if I wanted to return "plant", "planting", and "planted", I would be able to query "plant+". – Sam Zipper Mar 14 '14 at 20:12
  • 1
    I don't think so, sorry. Like I said in my answer, the API is pretty coarse-grained in how it lets you filter. – Aaron Hill Dec 16 '14 at 20:10
  • @AaronHill For example, to filter both words `hello` and `bye` (logically OR), we should use `track=['hello,bye']` or use `track=['hello', 'bye']` or maybe no differences? – Soheil Pourbafrani Dec 11 '18 at 15:24