I have a list of tweet ids for which I would like to download their text content. Is there any easy solution to do this, preferably through a Python script? I had a look at other libraries like Tweepy and things don't appear to work so simple, and downloading them manually is out of the question since my list is very long.
-
What do you mean by simple ? Sorry but there is no such tool which will take voice input and download tweets for you , You have to code, BTW tweepy is one of the most easy and well documented twitter API library out there. – ZdaR Feb 07 '15 at 16:44
4 Answers
You can access specific tweets by their id with the statuses/show/:id
API route. Most Python Twitter libraries follow the exact same patterns, or offer 'friendly' names for the methods.
For example, Twython offers several show_*
methods, including Twython.show_status()
that lets you load specific tweets:
CONSUMER_KEY = "<consumer key>"
CONSUMER_SECRET = "<consumer secret>"
OAUTH_TOKEN = "<application key>"
OAUTH_TOKEN_SECRET = "<application secret"
twitter = Twython(
CONSUMER_KEY, CONSUMER_SECRET,
OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
tweet = twitter.show_status(id=id_of_tweet)
print(tweet['text'])
and the returned dictionary follows the Tweet object definition given by the API.
The tweepy
library uses tweepy.get_status()
:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
api = tweepy.API(auth)
tweet = api.get_status(id_of_tweet)
print(tweet.text)
where it returns a slightly richer object, but the attributes on it again reflect the published API.

- 1,048,767
- 296
- 4,058
- 3,343
Sharing my work that was vastly accelerated by the previous answers (thank you). This Python 2.7 script fetches the text for tweet IDs stored in a file. Adjust get_tweet_id() for your input data format; original configured for data at https://github.com/mdredze/twitter_sandy
Update April 2018: responding late to @someone bug report (thank you). This script no longer discards every 100th tweet ID (that was my bug). Please note that if a tweet is unavailable for whatever reason, the bulk fetch silently skips it. The script now warns if the response size is different from the request size.
'''
Gets text content for tweet IDs
'''
# standard
from __future__ import print_function
import getopt
import logging
import os
import sys
# import traceback
# third-party: `pip install tweepy`
import tweepy
# global logger level is configured in main()
Logger = None
# Generate your own at https://apps.twitter.com/app
CONSUMER_KEY = 'Consumer Key (API key)'
CONSUMER_SECRET = 'Consumer Secret (API Secret)'
OAUTH_TOKEN = 'Access Token'
OAUTH_TOKEN_SECRET = 'Access Token Secret'
# batch size depends on Twitter limit, 100 at this time
batch_size=100
def get_tweet_id(line):
'''
Extracts and returns tweet ID from a line in the input.
'''
(tagid,_timestamp,_sandyflag) = line.split('\t')
(_tag, _search, tweet_id) = tagid.split(':')
return tweet_id
def get_tweets_single(twapi, idfilepath):
'''
Fetches content for tweet IDs in a file one at a time,
which means a ton of HTTPS requests, so NOT recommended.
`twapi`: Initialized, authorized API object from Tweepy
`idfilepath`: Path to file containing IDs
'''
# process IDs from the file
with open(idfilepath, 'rb') as idfile:
for line in idfile:
tweet_id = get_tweet_id(line)
Logger.debug('get_tweets_single: fetching tweet for ID %s', tweet_id)
try:
tweet = twapi.get_status(tweet_id)
print('%s,%s' % (tweet_id, tweet.text.encode('UTF-8')))
except tweepy.TweepError as te:
Logger.warn('get_tweets_single: failed to get tweet ID %s: %s', tweet_id, te.message)
# traceback.print_exc(file=sys.stderr)
# for
# with
def get_tweet_list(twapi, idlist):
'''
Invokes bulk lookup method.
Raises an exception if rate limit is exceeded.
'''
# fetch as little metadata as possible
tweets = twapi.statuses_lookup(id_=idlist, include_entities=False, trim_user=True)
if len(idlist) != len(tweets):
Logger.warn('get_tweet_list: unexpected response size %d, expected %d', len(tweets), len(idlist))
for tweet in tweets:
print('%s,%s' % (tweet.id, tweet.text.encode('UTF-8')))
def get_tweets_bulk(twapi, idfilepath):
'''
Fetches content for tweet IDs in a file using bulk request method,
which vastly reduces number of HTTPS requests compared to above;
however, it does not warn about IDs that yield no tweet.
`twapi`: Initialized, authorized API object from Tweepy
`idfilepath`: Path to file containing IDs
'''
# process IDs from the file
tweet_ids = list()
with open(idfilepath, 'rb') as idfile:
for line in idfile:
tweet_id = get_tweet_id(line)
Logger.debug('Enqueing tweet ID %s', tweet_id)
tweet_ids.append(tweet_id)
# API limits batch size
if len(tweet_ids) == batch_size:
Logger.debug('get_tweets_bulk: fetching batch of size %d', batch_size)
get_tweet_list(twapi, tweet_ids)
tweet_ids = list()
# process remainder
if len(tweet_ids) > 0:
Logger.debug('get_tweets_bulk: fetching last batch of size %d', len(tweet_ids))
get_tweet_list(twapi, tweet_ids)
def usage():
print('Usage: get_tweets_by_id.py [options] file')
print(' -s (single) makes one HTTPS request per tweet ID')
print(' -v (verbose) enables detailed logging')
sys.exit()
def main(args):
logging.basicConfig(level=logging.WARN)
global Logger
Logger = logging.getLogger('get_tweets_by_id')
bulk = True
try:
opts, args = getopt.getopt(args, 'sv')
except getopt.GetoptError:
usage()
for opt, _optarg in opts:
if opt in ('-s'):
bulk = False
elif opt in ('-v'):
Logger.setLevel(logging.DEBUG)
Logger.debug("main: verbose mode on")
else:
usage()
if len(args) != 1:
usage()
idfile = args[0]
if not os.path.isfile(idfile):
print('Not found or not a file: %s' % idfile, file=sys.stderr)
usage()
# connect to twitter
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
api = tweepy.API(auth)
# hydrate tweet IDs
if bulk:
get_tweets_bulk(api, idfile)
else:
get_tweets_single(api, idfile)
if __name__ == '__main__':
main(sys.argv[1:])

- 3,571
- 3
- 34
- 43
-
I have tried your code, the function "statuses_lookup" is notreturning any value, am not even getting any exception. Can you suggest me on what could be wrong. – charvi Mar 28 '16 at 06:42
-
2It still works for me, tested tonight. Did you generate the consumer key, consumer secret, oath token and oath token secret strings at apps.twitter.com and put them in the script? Did you install tweepy? Did you use a valid tweet id (e.g., 260244087901413376)? Where did you post the code? – chrisinmtown Mar 30 '16 at 00:05
-
@chrisinmtown Thanks for your answer. I also have a CSV file including thousands of tweet ids and I'd like to get the tweet content using twitter API). I read your answer, but I have a few questions: 1) you defined several functions (e.g., get_tweets_single and get_tweets_bulk). Are they different solutions to get the tweets? 2) regarding the `get_tweets_single`, you mentioned it requires tons of HTTP requests and is NOT recommended. Since I have permission to only 50 requests/month, what do you recommend? Thanks! :) – mOna Jun 05 '20 at 02:00
You can access tweets in bulk (up to 100 at a time) with the status/lookup endpoint: https://dev.twitter.com/rest/reference/get/statuses/lookup

- 329
- 4
- 11
-
Well, it does work but the text is truncated to 140 characters. I do not see an option to set extended mode, do you know how to get full tweet text? Thanks – DataNoob Jul 20 '20 at 23:14
I don't have enough reputation to add an actual comment so sadly this is the way to go:
I found a bug and a strange thing in chrisinmtown answer:
Every 100th tweet will be skipped due to the bug. Here is a simple solution:
if len(tweet_ids) < 100:
tweet_ids.append(tweet_id)
else:
tweet_ids.append(tweet_id)
get_tweet_list(twapi, tweet_ids)
tweet_ids = list()
Using is better since it works even past the rate limit.
api = tweepy.API(auth_handler=auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
-
Thanks someone for bug report - seems that old code discards every 100th ID dangit. Oh next time pls use @ in front of my name so I get a notification! – chrisinmtown Apr 02 '18 at 14:24