1

I am trying to run a simple script that will stream live Tweets. Several attempts to filter out retweets have been unsuccessful. I still get manual retweets (with the text "RT @") in my stream. I've tried other methods including link and link.

As I am learning, my code is very similar to the following: link

What can I do to ignore retweets?

Here is a snippet of my code:

class StreamListener(tweepy.StreamListener):

    def on_status(self, status):
        if (status.retweeted) and ('RT @' not in status.text):
            return

    description = status.user.description
    loc = status.user.location
    text = status.text
    coords = status.coordinates
    geo = status.geo
    name = status.user.screen_name
    user_created = status.user.created_at
    followers = status.user.followers_count
    id_str = status.id_str
    created = status.created_at
    retweets = status.retweet_count
    bg_color = status.user.profile_background_color

    # Initialize TextBlob class on text of each tweet
    # To get sentiment score from each class
    blob = TextBlob(text)
    sent = blob.sentiment
Community
  • 1
  • 1
Kevin
  • 1,659
  • 5
  • 16
  • 22
  • what does the `status` object look like? – ninesalt Apr 26 '17 at 21:09
  • 1
    your logic seems a little confused - `(status.retweeted) and ('RT @' not in status.text)` would just return "official" retweets. Maybe you should be using `(status.retweeted) or ('RT @' in status.text)` to exclude both "official" and "manual" retweets – asongtoruin Apr 27 '17 at 08:29

1 Answers1

1

What you could do is create another function to call inside of the on_status in your StreamListener. Here is something that worked for me:

def analyze_status(text):
    if 'RT' in text[0:3]:
        print("This status was retweeted!")
        print(text)
    else:
        print("This status was not retweeted!")
        print(text)

class MyStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        analyze_status(status.text)
    def on_error(self, status_code):
        print(status_code)

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=twitter_api.auth, listener=myStreamListener)
myStream.filter(track=['Trump'])

That yields the following:

This status was not retweeted!
@baseballcrank @seanmdav But they won't, cause Trump's name is on it. I can already hear their stupidity, "I hate D… 
This status was retweeted!
RT @OvenThelllegals: I'm about to end the Trump administration with a single tweet
This status was retweeted!
RT @kylegriffin1: FLASHBACK: April 2016

SAVANNAH GUTHRIE: "Do you believe in raising taxes on the wealthy?"

TRUMP: "I do. I do. Inc… 

This is not the most elegant solution, but I do believe it addresses the issue that you were facing.

Brian
  • 148
  • 10
  • would this not assume "official" retweets (which I believe don't necessarily begin with "RT @") were not retweeted? – asongtoruin Apr 27 '17 at 08:30
  • I have let a `StreamListener()` run for a long time with a popular topic only printing when `status.retweeted` and it never prints. Maybe the .retweeted attribute is not working as intended? – Brian Apr 27 '17 at 11:40
  • hm, this seems unusual. You could try `hasattr(status, 'retweeted_status')`, which seems to be used fairly consistently for both traditional "retweets" and "quotes", assuming it's been done through the twitter interface. – asongtoruin Apr 27 '17 at 15:32