I have a set of the keywords (over 600) and I want to use streaming api to track tweets with them. Twitter api limits the number of keywords, which you are allowed to track, to 200. So I decided to have several threads that will do it, using several OAuth tokens for this. This is how I do it:
String[] dbKeywords = KeywordImpl.listKeywords();
List<String[]> keywords = ditributeKeywords(dbKeywords);
for (String[] subList : keywords) {
StreamCrawler streamCrawler = new StreamCrawler();
streamCrawler.setKeywords(subList);
Thread crawlerThread = new Thread(streamCrawler);
crawlerThread.start();
}
This is how words are distributed among threads. Each thread receives no more than 200 words. This is the implementation of the StreamCrawler:
public class StreamCrawler extends Crawler implements Runnable {
...
private String[] keywords;
public void setKeywords(String[] keywords) {
this.keywords = keywords;
}
@Override
public void run() {
TwitterStream twitterStream = getTwitterInstance();
StatusListener listener = new StatusListener() {
ArrayDeque<Tweet> tweetbuffer = new ArrayDeque<Tweet>();
ArrayDeque<TwitterUser> userbuffer = new ArrayDeque<TwitterUser>();
@Override
public void onException(Exception arg0) {
System.out.println(arg0);
}
@Override
public void onDeletionNotice(StatusDeletionNotice arg0) {
System.out.println(arg0);
}
@Override
public void onScrubGeo(long arg0, long arg1) {
System.out.println(arg1);
}
@Override
public void onStatus(Status status) {
...Doing something with message
}
@Override
public void onTrackLimitationNotice(int arg0) {
System.out.println(arg0);
try {
Thread.sleep(5 * 60 * 1000);
System.out.println("Will sleep for 5 minutes!");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Override
public void onStallWarning(StallWarning arg0) {
System.out.println(arg0);
}
};
FilterQuery fq = new FilterQuery();
String keywords[] = getKeywords();
System.out.println(keywords.length);
System.out.println("Listening for " + Arrays.toString(keywords));
fq.track(keywords);
twitterStream.addListener(listener);
twitterStream.filter(fq);
}
private long getCurrentThreadId() {
return Thread.currentThread().getId();
}
private TwitterStream getTwitterInstance() {
TwitterConfiguration configuration = null;
TwitterStream twitterStream = null;
while (configuration == null) {
configuration = TokenFactory.getAvailableToken();
if (configuration != null) {
System.out
.println("Token was obtained " + getCurrentThreadId());
System.out.println(configuration.getTwitterAccount());
setToken(configuration);
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true);
cb.setOAuthConsumerKey(configuration.getConsumerKey());
cb.setOAuthConsumerSecret(configuration.getConsumerSecret());
cb.setOAuthAccessToken(configuration.getAccessToken());
cb.setOAuthAccessTokenSecret(configuration.getAccessSecret());
twitterStream = new TwitterStreamFactory(cb.build())
.getInstance();
} else {
// If there is no available configuration, wait for 2 minutes
// and try again
try {
System.out
.println("There were no available tokens, sleeping for 2 minutes.");
Thread.sleep(2 * 60 * 1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
return twitterStream;
}
}
So my problem is that when I start for example 2 threads I get notification that both of them are opening stream and getting it. But actually only first one is really getting stream and respectively calling OnStatus method. The array, which is used in the second thread, is not empty; Twitterconfiguration is also valid and unique. So I don't understand what might be the reason for such behavior. Why does the only first thread return tweets?