0

I'm trying to pull tweets from a user's timeline in real-time. I then want to do some analysis on those tweets. Having read the docs it looks like I will need to use tweepy.Stream for this use case. I've done the following:

stream.filter(follow='25073877')

But Twitter's filter API states the following:

  • Tweets created by the user.
  • Tweets which are retweeted by the user.
  • Replies to any Tweet created by the user.
  • Retweets of any Tweet created by the user.
  • Manual replies, created without pressing a reply button (e.g. “@twitterapi I agree”).

It seems that this will return a huge volume of tweets that aren't relevant to my use case. Do I have to use this approach and then filter by screen name to get only tweets by the real user? This doesn't seem right at all.

The alternative seems to be the api.user_timeline class but that isn't a streaming API. Do I therefore use this API and hit it every second? I can't seem to find suitable examples of how best to accomplish my use case.

hebrodoth
  • 17
  • 4

1 Answers1

0

Yes, you'll need to filter either by screen_name or maybe you can check if it's a retweet or not.

I wouldn't recommend the second approach since you'll be getting an even bigger amount of tweets since you'll have to filter out the tweets you already received in previous requests plus you may hit the API querying limits if you don't time ti properly.

That's the signature of the filter function:

    def filter(self, follow=None, track=None, is_async=False, locations=None,
               stall_warnings=False, languages=None, encoding='utf8', filter_level=None)

Which maps to this Twitter API request.

And here the explanation of the parameters.

  • Thanks for confirming. It's a shame you can't apply these filters within the API, as it seems a very common use case to me. If I follow @realDonaldTrump then it seems I'd have to filter potentially 10,000s of tweets just to get his single tweet? if you imagine the ratio of retweets to each one of his tweets. I did a test and filtered 'RT @' but even that only covered 50% of the tweets generated by this approach. – hebrodoth Oct 23 '19 at 08:42
  • My goal is to monitor the individual posts of influencers in real-time but I don't think it's feasible to do this given the amount of noise that is being returned by the API in this case? Any thoughts for scaling this out effectively? – hebrodoth Oct 23 '19 at 08:44
  • Are you also getting retweets from user tweets? I would have said you'd get retweets from him and replies, etc. In any case, I don't think you shouldn't be able to process them even if the data volume is big. – epsilonmajorquezero Oct 29 '19 at 08:58