20

Ultimate goal is to use the tweepy api search to focus on topics (i.e docker) and to EXCLUDE retweets. I have looked at other threads that mention excluding retweets but they were completely applicable. I have tried to incorporate what I've learned into the code below but I believe the "if not" piece of code is in the wrong place. Any help is greatly appreciated.

#!/usr/bin/python
import tweepy
import csv #Import csv
import os

# Consumer keys and access tokens, used for OAuth
consumer_key = 'MINE'
consumer_secret = 'MINE'
access_token = 'MINE'
access_token_secret = 'MINE'

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)


api = tweepy.API(auth)
# Open/Create a file to append data
csvFile = open('docker1.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)


ids = set()
for tweet in tweepy.Cursor(api.search, 
                    q="docker", 
                    Since="2016-08-09", 
                    #until="2014-02-15", 
                    lang="en").items(5000000):
if not tweet['retweeted'] and 'RT @' not in tweet['text']:
    #Write a row to the csv file/ I use encode utf-8
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8'), tweet.favorite_count, tweet.retweet_count, tweet.id, tweet.user.screen_name])
    #print "...%s tweets downloaded so far" % (len(tweet.id))
    ids.add(tweet.id) # add new id
    print ("number of unique ids seen so far: {}",format(len(ids)))
csvFile.close()

Error Message

hansolo
  • 903
  • 4
  • 12
  • 28
  • is there any error you're getting? or you're just looking for code optimization? – harshil9968 Aug 10 '16 at 12:02
  • @harshi9968 getting multiple erros ... Incorrect Syntax, 'Status' object has no attribute '--getitem--' ... From researching other posts I know that the `if not tweet['retweeted'] and 'RT @' not in tweet['text']` is what I want, but I am not sure exactly where to place it in the code to get what I need – hansolo Aug 10 '16 at 12:41
  • can you post a screenshot of the errors? – harshil9968 Aug 10 '16 at 12:43
  • @harshil9968 just attached an error message to my original post – hansolo Aug 10 '16 at 13:21

2 Answers2

38

Filtering at API level:

q='your_search -filter:retweets'

read more on this here.

Dumb way is to filter in code

So tweet is an object not a JSON or dict, you should not access it like tweet['retweeted'] and tweet['text']

Instead use this line :

if not tweet.retweeted:

Or for your use case :

if (not tweet.retweeted) and ('RT @' not in tweet.text):
harshil9968
  • 3,254
  • 1
  • 16
  • 26
  • very useful. thank you again. Quick question to compliment this one ... If I wanted to pass additional parameters into the if statement would that be possible? For example, if I only want to bring in tweets where tweet.favorite_count or tweet.retweet_count are > 0, could I do something like: `if (not tweet.retweeted) and ('RT @' not in tweet.text) and (tweet.favorite_count > 0):` – hansolo Aug 10 '16 at 14:04
  • yes it would be like that, if it helped please accpet the answer and upvote. – harshil9968 Aug 10 '16 at 14:07
  • Is there a way to filter blocked users? – Daniel Zhang Jul 16 '21 at 16:03
25

In addition to the accepted answer, I would suggest that you change the request you make, from q="docker" to q="docker -filter:retweets"

This will prevent most retweets from even appearing in the results.

Efferalgan
  • 1,681
  • 1
  • 14
  • 24
  • 1
    This worked for me. You can do the same for any other standard operators per the Twitter docs: https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators – carlos_cantana Jan 12 '18 at 22:19
  • Up to this answer. It is best to filter from API rather than on code. Less data is fetched from Twitter and API limits won't be exceeded. – Miquel Canal Feb 18 '21 at 15:59