Our group is working on a sentiment analysis research project. We are trying to use the Twitter API to collect tweets. Out aimed dataset involves a lot of query terms and filters. However, since each of us has a developer account, we were wondering if we can pool API access tokens to accelerate the data collection. For example, we will make an app that allows us to define a configuration file that contains a list of our access tokens that the app will try to use to search for a tweet. This app will be run on our local computer. Since the app uses our individual access tokens, we believe that we are not actually not bypassing or changing any Twitter limit as the record is kept for each access token. Are there any problems legal/technical that may arise from this methodology? Thank you! =D
Here is a pseudocode for what we are trying to do:
1. define a list of search terms such as 'apple', 'banana'
and 'oranges' (we have 100 of these search terms, we are okay
with the 100 limit per tweet)
2. define a list of frequent emotional adjectives such as 'happy', 'sad', 'crazy', etc. (we have have 100 of these) using TF-IDF
3. get the product of the search terms and emotional adjectives,
in total we have 10,000 query terms and we have computed
through the rate limit rules that we would need at least
55 runs of 15-minute sessions with 180 tweets per 15-minute.
55 * 15 = 825 minutes or ~14 hours to collect this amount of tweets.
4. we were thinking of improving the data collection by
pooling access tokens so that we can trim down the time
of collection from 14 hours to ~4 hours, e.g. by dividing the query items into subsets and letting a specific access token work on a subset
We were pushing for this since we just think it's efficient if it's possible and permitted since why not and it might help future researches as well?
The question is, are we actually breaking any Twitter rules or policies by doing this? By sharing one access token per each of us three and creating an app that we name as clones of the research project, we believe that in turn we are also losing something which is the headroom for one more app that we fully control.
I can't find specific rule in Twitter so far about this. Our concern is that we will publish a paper and will publish the app we will program and use for documentation and the app we plan to build. Disclaimer: Only the app's source code will be published and not the dataset because of Twitter's explicit rules about datasets.