0

I have a service that successfully monitors Twitters' statuses via Streaming API (the filter endpoint with the track parameter). All works well and I receive a lot of tweets with predefined keywords. The only problem is that I do not get my own tweets with these keywords. Is this normal? Should I have a separate special account for the application if it must collect ALL relevant data on Twitter including my messages?

Thanks in advance.

UPDATE

I've found a partial answer here, and I'm posting a part of the Twitter staff explanation below, for reference:

With [node:10389] in particular, you are filtering from the firehose, with a maximum resulting volume of 1% of the total Tweets at that moment... In other words, if the keywords you are tracking account for less than 1% of the firehose, you will receive all the matching Tweets, otherwise you will be capped. To give you an idea, there are more than 500 million Tweets posted every single day on Twitter, so 1% still represents a very large number.

So, tweets we receive via Streaming API are just an arbitrary subset of all tweets which are matching given predicates. BTW, I doubt that my keywords produce 1% data flow of the whole Twitter, but I can't check this out.

Ok, nothing to do here, but then the next question is - how can I determine which part of the firehose I'm getting at every moment in percents? If I'd know this I could change predicates to narrow my query and try to get much more than 1% of default, with improved relevance and data flow coverege.

Stan
  • 8,683
  • 9
  • 58
  • 102

1 Answers1

0

Twitter’s Streaming API is a push of data as tweets happen in near real-time, Unlike Twitter’s Search API where you are polling data from tweets that have already happened. With Twitter’s Streaming API, users register a set of criteria (keywords, usernames, locations, named places, etc.) and as tweets match the criteria, they are pushed directly to the user. Think of this as an agreement between the end user and Twitter – you agree with Twitter that whenever they receive tweets that match keywords relating to “hockey”, they will deliver the tweet directly to you as they happen.

The major drawback of the Streaming API is that Twitter’s Steaming API provides only a sample of tweets that are occurring. The actual percentage of total tweets users receive with Twitter’s Streaming API varies heavily based on the criteria users request and the current traffic. Studies have estimated that using Twitter’s Streaming API users can expect to receive anywhere from 1% of the tweets to over 40% of tweets in near real-time. The reason that you do not receive all of the tweets from the Twitter Streaming API is simply because Twitter doesn’t have the current infrastructure to support it, and they don’t want to; hence, the Twitter Firehose.

how can I determine which part of the firehose I'm getting at every moment in percents?

You simply can't

If I'd know this I could change predicates to narrow my query and try to get much more than 1% of default, with improved relevance and data flow coverege.

On the other way, search for more! Search for all keywords related to your query. And then after receiving tweets you can simply classify them or discard!

Sam
  • 822
  • 2
  • 8
  • 30